public class RegexNERecogniser extends Object implements NERecogniser
NERecogniser based on
Regular Expressions.
The default configuration file "ner-regex.txt" is used when no
argument constructor is used to instantiate this class. The regex file is
loaded via Class.getResourceAsStream(String), so the file should be
placed in the same package path as of this class.
ENTITY_TYPE1=REGEX1 ENTITY_TYPE2=REGEX2For example, to extract week day from text:
WEEK_DAY=(?i)((sun)|(mon)|(tues)|(thurs)|(fri)|((sat)(ur)?))(day)?
| Modifier and Type | Field and Description |
|---|---|
Set<String> |
entityTypes |
static String |
NER_REGEX_FILE |
Map<String,Pattern> |
patterns |
DATE, LOCATION, MISCELLANEOUS, MONEY, ORGANIZATION, PERCENT, PERSON, TIME| Constructor and Description |
|---|
RegexNERecogniser() |
RegexNERecogniser(InputStream stream) |
| Modifier and Type | Method and Description |
|---|---|
Set<String> |
findMatches(String text,
Pattern pattern)
finds matching sub groups in text
|
Set<String> |
getEntityTypes()
gets a set of entity types whose names are recognisable by this
|
static RegexNERecogniser |
getInstance() |
boolean |
isAvailable()
checks if this Named Entity recogniser is available for service
|
Map<String,Set<String>> |
recognise(String text)
call for name recognition action from text
|
public static final String NER_REGEX_FILE
public RegexNERecogniser()
public RegexNERecogniser(InputStream stream)
public static RegexNERecogniser getInstance()
public boolean isAvailable()
NERecogniserisAvailable in interface NERecogniserpublic Set<String> getEntityTypes()
NERecognisergetEntityTypes in interface NERecogniserpublic Set<String> findMatches(String text, Pattern pattern)
text - text containing interesting sub stringspattern - pattern to find sub stringspublic Map<String,Set<String>> recognise(String text)
NERecogniserrecognise in interface NERecognisertext - text with possibly contains namesCopyright © 2007–2023 The Apache Software Foundation. All rights reserved.