Package org.apache.tika.parser.ctakes
Class CTAKESUtils
java.lang.Object
org.apache.tika.parser.ctakes.CTAKESUtils
This class provides methods to extract biomedical information from plain text
using
CTAKESContentHandler
that relies on Apache cTAKES.
Apache cTAKES is built on top of Apache UIMA framework and OpenNLP toolkit.
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionstatic org.apache.uima.analysis_engine.AnalysisEngine
getAnalysisEngine
(String aeDescriptor, String umlsUser, String umlsPass) Returns a new UIMA Analysis Engine (AE).static String
getAnnotationProperty
(org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation annotation, CTAKESAnnotationProperty property) Returns the annotation value based on the given annotation type.static org.apache.uima.jcas.JCas
getJCas
(org.apache.uima.analysis_engine.AnalysisEngine ae) Returns a new JCas () appropriate for the given Analysis Engine.static void
reset
(org.apache.uima.analysis_engine.AnalysisEngine ae, org.apache.uima.jcas.JCas jcas) Resets cTAKES objects, if created.static void
resetAE
(org.apache.uima.analysis_engine.AnalysisEngine ae) Resets the AE (AnalysisEngine), releasing all resources held by the current AE.static void
resetCAS
(org.apache.uima.jcas.JCas jcas) Resets the CAS (Common Analysis System), emptying it of all content.static void
serialize
(org.apache.uima.jcas.JCas jcas, CTAKESSerializer type, boolean prettyPrint, OutputStream stream) Serializes a CAS in the given format.
-
Constructor Details
-
CTAKESUtils
public CTAKESUtils()
-
-
Method Details
-
getAnalysisEngine
public static org.apache.uima.analysis_engine.AnalysisEngine getAnalysisEngine(String aeDescriptor, String umlsUser, String umlsPass) throws IOException, org.apache.uima.util.InvalidXMLException, org.apache.uima.resource.ResourceInitializationException, URISyntaxException Returns a new UIMA Analysis Engine (AE). This method ensures that only one instance of an AE is created.An Analysis Engine is a component responsible for analyzing unstructured information, discovering and representing semantic content. Unstructured information includes, but is not restricted to, text documents.
- Parameters:
aeDescriptor
- pathname for XML file including an AnalysisEngineDescription that contains all of the information needed to instantiate and use an AnalysisEngine.umlsUser
- UMLS username for NLM databaseumlsPass
- UMLS password for NLM database- Returns:
- an Analysis Engine for analyzing unstructured information.
- Throws:
IOException
- if any I/O error occurs.org.apache.uima.util.InvalidXMLException
- if the input XML is not valid or does not specify a valid ResourceSpecifier.org.apache.uima.resource.ResourceInitializationException
- if a failure occurred during production of the resource.URISyntaxException
- if URL of the resource is not formatted strictly according to RFC2396 and cannot be converted to a URI.
-
getJCas
public static org.apache.uima.jcas.JCas getJCas(org.apache.uima.analysis_engine.AnalysisEngine ae) throws org.apache.uima.resource.ResourceInitializationException Returns a new JCas () appropriate for the given Analysis Engine. This method ensures that only one instance of a JCas is created. A Jcas is a Java Cover Classes based Object-oriented CAS (Common Analysis System) API.Important: It is highly recommended that you reuse CAS objects rather than creating new CAS objects prior to each analysis. This is because CAS objects may be expensive to create and may consume a significant amount of memory.
- Parameters:
ae
- AnalysisEngine used to create an appropriate JCas object.- Returns:
- a JCas object appropriate for the given AnalysisEngine.
- Throws:
org.apache.uima.resource.ResourceInitializationException
- if a CAS could not be created because this AnalysisEngine's CAS metadata (type system, type priorities, or FS indexes) are invalid.
-
serialize
public static void serialize(org.apache.uima.jcas.JCas jcas, CTAKESSerializer type, boolean prettyPrint, OutputStream stream) throws SAXException, IOException Serializes a CAS in the given format.- Parameters:
jcas
- CAS (Common Analysis System) to be serialized.type
- type of cTAKES (UIMA) serializer used to write CAS.prettyPrint
-true
to do pretty printing of output.stream
-OutputStream
object used to print out information extracted by using cTAKES.- Throws:
SAXException
- if there was a SAX exception.IOException
- if any I/O error occurs.
-
getAnnotationProperty
public static String getAnnotationProperty(org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation annotation, CTAKESAnnotationProperty property) Returns the annotation value based on the given annotation type.- Parameters:
annotation
-IdentifiedAnnotation
object.property
-CTAKESAnnotationProperty
enum used to identify the annotation type.- Returns:
- the annotation value.
-
reset
public static void reset(org.apache.uima.analysis_engine.AnalysisEngine ae, org.apache.uima.jcas.JCas jcas) Resets cTAKES objects, if created. This method ensures that new cTAKES objects (a.k.a., Analysis Engine and JCas) will be created if getters of this class are called.- Parameters:
ae
- UIMA Analysis Enginejcas
- JCas object
-
resetCAS
public static void resetCAS(org.apache.uima.jcas.JCas jcas) Resets the CAS (Common Analysis System), emptying it of all content.- Parameters:
jcas
- JCas object
-
resetAE
public static void resetAE(org.apache.uima.analysis_engine.AnalysisEngine ae) Resets the AE (AnalysisEngine), releasing all resources held by the current AE.- Parameters:
ae
- UIMA Analysis Engine
-