public class RegexNERecogniser extends Object implements NERecogniser
NERecogniser
based on
Regular Expressions.
The default configuration file "ner-regex.txt" is used when no
argument constructor is used to instantiate this class. The regex file is
loaded via Class.getResourceAsStream(String)
, so the file should be
placed in the same package path as of this class.
ENTITY_TYPE1=REGEX1 ENTITY_TYPE2=REGEX2For example, to extract week day from text:
WEEK_DAY=(?i)((sun)|(mon)|(tues)|(thurs)|(fri)|((sat)(ur)?))(day)?
Modifier and Type | Field and Description |
---|---|
Set<String> |
entityTypes |
static String |
NER_REGEX_FILE |
Map<String,Pattern> |
patterns |
DATE, LOCATION, MISCELLANEOUS, MONEY, ORGANIZATION, PERCENT, PERSON, TIME
Constructor and Description |
---|
RegexNERecogniser() |
RegexNERecogniser(InputStream stream) |
Modifier and Type | Method and Description |
---|---|
Set<String> |
findMatches(String text,
Pattern pattern)
finds matching sub groups in text
|
Set<String> |
getEntityTypes()
gets a set of entity types whose names are recognisable by this
|
static RegexNERecogniser |
getInstance() |
boolean |
isAvailable()
checks if this Named Entity recogniser is available for service
|
Map<String,Set<String>> |
recognise(String text)
call for name recognition action from text
|
public static final String NER_REGEX_FILE
public RegexNERecogniser()
public RegexNERecogniser(InputStream stream)
public static RegexNERecogniser getInstance()
public boolean isAvailable()
NERecogniser
isAvailable
in interface NERecogniser
public Set<String> getEntityTypes()
NERecogniser
getEntityTypes
in interface NERecogniser
public Set<String> findMatches(String text, Pattern pattern)
text
- text containing interesting sub stringspattern
- pattern to find sub stringspublic Map<String,Set<String>> recognise(String text)
NERecogniser
recognise
in interface NERecogniser
text
- text with possibly contains namesCopyright © 2007–2021 The Apache Software Foundation. All rights reserved.