Package org.apache.tika.parser.ner.regex
Class RegexNERecogniser
java.lang.Object
org.apache.tika.parser.ner.regex.RegexNERecogniser
- All Implemented Interfaces:
NERecogniser
This class offers an implementation of
NERecogniser
based on
Regular Expressions.
The default configuration file "ner-regex.txt" is used when no
argument constructor is used to instantiate this class. The regex file is
loaded via Class.getResourceAsStream(String)
, so the file should be
placed in the same package path as of this class.
ENTITY_TYPE1=REGEX1 ENTITY_TYPE2=REGEX2For example, to extract week day from text:
WEEK_DAY=(?i)((sun)|(mon)|(tues)|(thurs)|(fri)|((sat)(ur)?))(day)?
- Since:
- Nov. 7, 2015
-
Field Summary
Fields inherited from interface org.apache.tika.parser.ner.NERecogniser
DATE, LOCATION, MISCELLANEOUS, MONEY, ORGANIZATION, PERCENT, PERSON, TIME
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionfindMatches
(String text, Pattern pattern) finds matching sub groups in textgets a set of entity types whose names are recognisable by thisstatic RegexNERecogniser
boolean
checks if this Named Entity recogniser is available for servicecall for name recognition action from text
-
Field Details
-
NER_REGEX_FILE
- See Also:
-
entityTypes
-
patterns
-
-
Constructor Details
-
RegexNERecogniser
public RegexNERecogniser() -
RegexNERecogniser
-
-
Method Details
-
getInstance
-
isAvailable
public boolean isAvailable()Description copied from interface:NERecogniser
checks if this Named Entity recogniser is available for service- Specified by:
isAvailable
in interfaceNERecogniser
- Returns:
- true if this recogniser is ready to recognise, false otherwise
-
getEntityTypes
Description copied from interface:NERecogniser
gets a set of entity types whose names are recognisable by this- Specified by:
getEntityTypes
in interfaceNERecogniser
- Returns:
- set of entity types/classes
-
findMatches
finds matching sub groups in text- Parameters:
text
- text containing interesting sub stringspattern
- pattern to find sub strings- Returns:
- set of sub strings if any found, or null if none found
-
recognise
Description copied from interface:NERecogniser
call for name recognition action from text- Specified by:
recognise
in interfaceNERecogniser
- Parameters:
text
- text with possibly contains names- Returns:
- map of entityType -> set of names
-