Class StandardsText


  • public class StandardsText
    extends Object
    StandardText relies on regular expressions to extract standard references from text.

    This class helps to find the standard references from text by performing the following steps:

    1. searches for headers;
    2. searches for patterns that are supposed to be standard references (basically, every string mostly composed of uppercase letters followed by an alphanumeric characters);
    3. each potential standard reference starts with score equal to 0.25;
    4. increases by 0.25 the score of references which include the name of a known standard organization (StandardOrganizations);
    5. increases by 0.25 the score of references which include the word Publication or Standard;
    6. increases by 0.25 the score of references which have been found within "Applicable Documents" and equivalent sections;
    7. returns the standard references along with scores.

    • Constructor Detail

      • StandardsText

        public StandardsText()
    • Method Detail

      • extractStandardReferences

        public static ArrayList<StandardReference> extractStandardReferences​(String text,
                                                                             double threshold)
        Extracts the standard references found within the given text.
        Parameters:
        text - the text from which the standard references are extracted.
        threshold - the lower bound limit to be used in order to select only the standard references with score greater than or equal to the threshold. For instance, using a threshold of 0.75 means that only the patterns with score greater than or equal to 0.75 will be returned.
        Returns:
        the list of standard references extracted from the given text.