Package org.apache.tika.sax
Class StandardsText
- java.lang.Object
- 
- org.apache.tika.sax.StandardsText
 
- 
 public class StandardsText extends Object StandardText relies on regular expressions to extract standard references from text.This class helps to find the standard references from text by performing the following steps: - searches for headers;
- searches for patterns that are supposed to be standard references (basically, every string mostly composed of uppercase letters followed by an alphanumeric characters);
- each potential standard reference starts with score equal to 0.25;
- increases by 0.25 the score of references which include the name of a
 known standard organization (StandardOrganizations);
- increases by 0.25 the score of references which include the word Publication or Standard;
- increases by 0.25 the score of references which have been found within "Applicable Documents" and equivalent sections;
- returns the standard references along with scores.
 
- 
- 
Constructor SummaryConstructors Constructor Description StandardsText()
 - 
Method SummaryAll Methods Static Methods Concrete Methods Modifier and Type Method Description static ArrayList<StandardReference>extractStandardReferences(String text, double threshold)Extracts the standard references found within the given text.
 
- 
- 
- 
Method Detail- 
extractStandardReferencespublic static ArrayList<StandardReference> extractStandardReferences(String text, double threshold) Extracts the standard references found within the given text.- Parameters:
- text- the text from which the standard references are extracted.
- threshold- the lower bound limit to be used in order to select only the standard references with score greater than or equal to the threshold. For instance, using a threshold of 0.75 means that only the patterns with score greater than or equal to 0.75 will be returned.
- Returns:
- the list of standard references extracted from the given text.
 
 
- 
 
-