Package opennlp.tools.tokenize
Class TokenizerFactory
java.lang.Object
opennlp.tools.util.BaseToolFactory
opennlp.tools.tokenize.TokenizerFactory
The factory that provides
Tokenizer default implementation and
resources. Users can extend this class if their application requires
overriding the TokenContextGenerator, Dictionary etc.-
Constructor Summary
ConstructorsConstructorDescriptionInstantiates aTokenizerFactorythat provides the default implementation of the resources.TokenizerFactory(String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern) Instantiates aTokenizerFactory. -
Method Summary
Modifier and TypeMethodDescriptionstatic TokenizerFactorycreate(String subclassName, String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern) Factory method the framework uses instantiate a newTokenizerFactory.A model's implementation should call this constructor that creates a model programmatically.booleanvoidValidates the parsed artifacts.Methods inherited from class opennlp.tools.util.BaseToolFactory
create, create, createArtifactSerializersMap
-
Constructor Details
-
TokenizerFactory
public TokenizerFactory()Instantiates aTokenizerFactorythat provides the default implementation of the resources. -
TokenizerFactory
public TokenizerFactory(String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern) Instantiates aTokenizerFactory. Use this constructor to programmatically create a factory.- Parameters:
languageCode- The ISO language code to be used for this factory.abbreviationDictionary- TheDictionarywhich holds abbreviations.useAlphaNumericOptimization- Whether alphanumerics are skipped, or not.alphaNumericPattern-nullor a custom alphanumericPattern(default is:"^[A-Za-z0-9]+$", provided byFactory.DEFAULT_ALPHANUMERIC.
-
-
Method Details
-
validateArtifactMap
Description copied from class:BaseToolFactoryValidates the parsed artifacts.Note: Subclasses should generally invoke
super.validateArtifactMapat the beginning of this method.- Specified by:
validateArtifactMapin classBaseToolFactory- Throws:
InvalidFormatException- Thrown if validation found invalid states.
-
createArtifactMap
Description copied from class:BaseToolFactoryA model's implementation should call this constructor that creates a model programmatically.The base implementation will return a
HashMapthat should be populated by subclasses.- Overrides:
createArtifactMapin classBaseToolFactory- Returns:
- Retrieves a
Mapwith pairs of keys and objects.
-
createManifestEntries
- Overrides:
createManifestEntriesin classBaseToolFactory- Returns:
- Retrieves the manifest entries to be added to the model manifest.
-
create
public static TokenizerFactory create(String subclassName, String languageCode, Dictionary abbreviationDictionary, boolean useAlphaNumericOptimization, Pattern alphaNumericPattern) throws InvalidFormatException Factory method the framework uses instantiate a newTokenizerFactory.- Parameters:
subclassName- The name of the class implementing theTokenizerFactory.languageCode- The ISO language code theTokenizershould use.abbreviationDictionary- An optionalDictionarycontaining abbreviations, ornullif not present.useAlphaNumericOptimization- Whether the alphanumeric optimization is be enabled or not.alphaNumericPattern- ThePatternthe alphanumeric optimization should use, if enabled.- Returns:
- A valid
TokenizerFactoryinstance. - Throws:
InvalidFormatException- Thrown if one of the input parameters doesn't comply the expected format.
-
getAlphaNumericPattern
- Returns:
- Retrieves the (user-)specified alphanumeric
Patternor a default.
-
isUseAlphaNumericOptimization
public boolean isUseAlphaNumericOptimization()- Returns:
trueif the alphanumeric optimization is enabled, otherwisefalse.
-
getAbbreviationDictionary
- Returns:
- The abbreviation
Dictionaryornullif none is active.
-
getLanguageCode
- Returns:
- Retrieves the ISO language code in use.
-
getContextGenerator
- Returns:
- Retrieves a
TokenContextGeneratorinstance.
-