Package org.apache.tika.parser.ner.regex
Class RegexNERecogniser
- java.lang.Object
-
- org.apache.tika.parser.ner.regex.RegexNERecogniser
-
- All Implemented Interfaces:
NERecogniser
public class RegexNERecogniser extends Object implements NERecogniser
This class offers an implementation ofNERecogniser
based on Regular Expressions.The default configuration file "ner-regex.txt" is used when no argument constructor is used to instantiate this class. The regex file is loaded via
The format of regex configuration as follows:Class.getResourceAsStream(String)
, so the file should be placed in the same package path as of this class.ENTITY_TYPE1=REGEX1 ENTITY_TYPE2=REGEX2
For example, to extract week day from text:WEEK_DAY=(?i)((sun)|(mon)|(tues)|(thurs)|(fri)|((sat)(ur)?))(day)?
- Since:
- Nov. 7, 2015
-
-
Field Summary
Fields Modifier and Type Field Description Set<String>
entityTypes
static String
NER_REGEX_FILE
Map<String,Pattern>
patterns
-
Fields inherited from interface org.apache.tika.parser.ner.NERecogniser
DATE, LOCATION, MISCELLANEOUS, MONEY, ORGANIZATION, PERCENT, PERSON, TIME
-
-
Constructor Summary
Constructors Constructor Description RegexNERecogniser()
RegexNERecogniser(InputStream stream)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description Set<String>
findMatches(String text, Pattern pattern)
finds matching sub groups in textSet<String>
getEntityTypes()
gets a set of entity types whose names are recognisable by thisstatic RegexNERecogniser
getInstance()
boolean
isAvailable()
checks if this Named Entity recogniser is available for serviceMap<String,Set<String>>
recognise(String text)
call for name recognition action from text
-
-
-
Constructor Detail
-
RegexNERecogniser
public RegexNERecogniser()
-
RegexNERecogniser
public RegexNERecogniser(InputStream stream)
-
-
Method Detail
-
getInstance
public static RegexNERecogniser getInstance()
-
isAvailable
public boolean isAvailable()
Description copied from interface:NERecogniser
checks if this Named Entity recogniser is available for service- Specified by:
isAvailable
in interfaceNERecogniser
- Returns:
- true if this recogniser is ready to recognise, false otherwise
-
getEntityTypes
public Set<String> getEntityTypes()
Description copied from interface:NERecogniser
gets a set of entity types whose names are recognisable by this- Specified by:
getEntityTypes
in interfaceNERecogniser
- Returns:
- set of entity types/classes
-
findMatches
public Set<String> findMatches(String text, Pattern pattern)
finds matching sub groups in text- Parameters:
text
- text containing interesting sub stringspattern
- pattern to find sub strings- Returns:
- set of sub strings if any found, or null if none found
-
recognise
public Map<String,Set<String>> recognise(String text)
Description copied from interface:NERecogniser
call for name recognition action from text- Specified by:
recognise
in interfaceNERecogniser
- Parameters:
text
- text with possibly contains names- Returns:
- map of entityType -> set of names
-
-