Package org.apache.tika.parser.ner.regex
Class RegexNERecogniser
java.lang.Object
org.apache.tika.parser.ner.regex.RegexNERecogniser
- All Implemented Interfaces:
NERecogniser
This class offers an implementation of
NERecogniser based on
Regular Expressions.
The default configuration file "ner-regex.txt" is used when no
argument constructor is used to instantiate this class. The regex file is
loaded via Class.getResourceAsStream(String), so the file should be
placed in the same package path as of this class.
ENTITY_TYPE1=REGEX1 ENTITY_TYPE2=REGEX2For example, to extract week day from text:
WEEK_DAY=(?i)((sun)|(mon)|(tues)|(thurs)|(fri)|((sat)(ur)?))(day)?
- Since:
- Nov. 7, 2015
-
Field Summary
FieldsFields inherited from interface org.apache.tika.parser.ner.NERecogniser
DATE, LOCATION, MISCELLANEOUS, MONEY, ORGANIZATION, PERCENT, PERSON, TIME -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionfindMatches(String text, Pattern pattern) finds matching sub groups in textgets a set of entity types whose names are recognisable by thisstatic RegexNERecogniserbooleanchecks if this Named Entity recogniser is available for servicecall for name recognition action from text
-
Field Details
-
NER_REGEX_FILE
- See Also:
-
entityTypes
-
patterns
-
-
Constructor Details
-
RegexNERecogniser
public RegexNERecogniser() -
RegexNERecogniser
-
-
Method Details
-
getInstance
-
isAvailable
public boolean isAvailable()Description copied from interface:NERecogniserchecks if this Named Entity recogniser is available for service- Specified by:
isAvailablein interfaceNERecogniser- Returns:
- true if this recogniser is ready to recognise, false otherwise
-
getEntityTypes
Description copied from interface:NERecognisergets a set of entity types whose names are recognisable by this- Specified by:
getEntityTypesin interfaceNERecogniser- Returns:
- set of entity types/classes
-
findMatches
finds matching sub groups in text- Parameters:
text- text containing interesting sub stringspattern- pattern to find sub strings- Returns:
- set of sub strings if any found, or null if none found
-
recognise
Description copied from interface:NERecognisercall for name recognition action from text- Specified by:
recognisein interfaceNERecogniser- Parameters:
text- text with possibly contains names- Returns:
- map of entityType -> set of names
-