Class RegexNERecogniser

  • All Implemented Interfaces:
    NERecogniser

    public class RegexNERecogniser
    extends Object
    implements NERecogniser
    This class offers an implementation of NERecogniser based on Regular Expressions.

    The default configuration file "ner-regex.txt" is used when no argument constructor is used to instantiate this class. The regex file is loaded via Class.getResourceAsStream(String), so the file should be placed in the same package path as of this class.

    The format of regex configuration as follows:
     ENTITY_TYPE1=REGEX1
     ENTITY_TYPE2=REGEX2
     
    For example, to extract week day from text:
    WEEK_DAY=(?i)((sun)|(mon)|(tues)|(thurs)|(fri)|((sat)(ur)?))(day)?
     
    Since:
    Nov. 7, 2015
    • Constructor Detail

      • RegexNERecogniser

        public RegexNERecogniser()
      • RegexNERecogniser

        public RegexNERecogniser​(InputStream stream)
    • Method Detail

      • isAvailable

        public boolean isAvailable()
        Description copied from interface: NERecogniser
        checks if this Named Entity recogniser is available for service
        Specified by:
        isAvailable in interface NERecogniser
        Returns:
        true if this recogniser is ready to recognise, false otherwise
      • getEntityTypes

        public Set<String> getEntityTypes()
        Description copied from interface: NERecogniser
        gets a set of entity types whose names are recognisable by this
        Specified by:
        getEntityTypes in interface NERecogniser
        Returns:
        set of entity types/classes
      • findMatches

        public Set<String> findMatches​(String text,
                                       Pattern pattern)
        finds matching sub groups in text
        Parameters:
        text - text containing interesting sub strings
        pattern - pattern to find sub strings
        Returns:
        set of sub strings if any found, or null if none found
      • recognise

        public Map<String,​Set<String>> recognise​(String text)
        Description copied from interface: NERecogniser
        call for name recognition action from text
        Specified by:
        recognise in interface NERecogniser
        Parameters:
        text - text with possibly contains names
        Returns:
        map of entityType -> set of names