Package org.apache.tika.parser.ner.regex
Class RegexNERecogniser
- java.lang.Object
-
- org.apache.tika.parser.ner.regex.RegexNERecogniser
-
- All Implemented Interfaces:
NERecogniser
public class RegexNERecogniser extends Object implements NERecogniser
This class offers an implementation ofNERecogniserbased on Regular Expressions.The default configuration file "ner-regex.txt" is used when no argument constructor is used to instantiate this class. The regex file is loaded via
The format of regex configuration as follows:Class.getResourceAsStream(String), so the file should be placed in the same package path as of this class.ENTITY_TYPE1=REGEX1 ENTITY_TYPE2=REGEX2
For example, to extract week day from text:WEEK_DAY=(?i)((sun)|(mon)|(tues)|(thurs)|(fri)|((sat)(ur)?))(day)?
- Since:
- Nov. 7, 2015
-
-
Field Summary
Fields Modifier and Type Field Description Set<String>entityTypesstatic StringNER_REGEX_FILEMap<String,Pattern>patterns-
Fields inherited from interface org.apache.tika.parser.ner.NERecogniser
DATE, LOCATION, MISCELLANEOUS, MONEY, ORGANIZATION, PERCENT, PERSON, TIME
-
-
Constructor Summary
Constructors Constructor Description RegexNERecogniser()RegexNERecogniser(InputStream stream)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description Set<String>findMatches(String text, Pattern pattern)finds matching sub groups in textSet<String>getEntityTypes()gets a set of entity types whose names are recognisable by thisstatic RegexNERecognisergetInstance()booleanisAvailable()checks if this Named Entity recogniser is available for serviceMap<String,Set<String>>recognise(String text)call for name recognition action from text
-
-
-
Constructor Detail
-
RegexNERecogniser
public RegexNERecogniser()
-
RegexNERecogniser
public RegexNERecogniser(InputStream stream)
-
-
Method Detail
-
getInstance
public static RegexNERecogniser getInstance()
-
isAvailable
public boolean isAvailable()
Description copied from interface:NERecogniserchecks if this Named Entity recogniser is available for service- Specified by:
isAvailablein interfaceNERecogniser- Returns:
- true if this recogniser is ready to recognise, false otherwise
-
getEntityTypes
public Set<String> getEntityTypes()
Description copied from interface:NERecognisergets a set of entity types whose names are recognisable by this- Specified by:
getEntityTypesin interfaceNERecogniser- Returns:
- set of entity types/classes
-
findMatches
public Set<String> findMatches(String text, Pattern pattern)
finds matching sub groups in text- Parameters:
text- text containing interesting sub stringspattern- pattern to find sub strings- Returns:
- set of sub strings if any found, or null if none found
-
recognise
public Map<String,Set<String>> recognise(String text)
Description copied from interface:NERecognisercall for name recognition action from text- Specified by:
recognisein interfaceNERecogniser- Parameters:
text- text with possibly contains names- Returns:
- map of entityType -> set of names
-
-