public class ParsingExample extends Object
Constructor and Description |
---|
ParsingExample() |
Modifier and Type | Method and Description |
---|---|
String |
parseEmbeddedExample()
This example shows how to extract content from the outer document and all
embedded documents.
|
String |
parseExample()
Example of how to use Tika to parse a file when you do not know its file type
ahead of time.
|
String |
parseNoEmbeddedExample()
If you don't want content from embedded documents, send in
a
ParseContext that does not contain a
Parser . |
String |
parseToStringExample()
Example of how to use Tika's parseToString method to parse the content of a file,
and return any text found.
|
List<Metadata> |
recursiveParserWrapperExample()
For documents that may contain embedded documents, it might be helpful
to create list of metadata objects, one for the container document and
one for each embedded document.
|
String |
serializedRecursiveParserWrapperExample()
We include a simple JSON serializer for a list of metadata with
JsonMetadataList . |
public String parseToStringExample() throws IOException, SAXException, TikaException
IOException
SAXException
TikaException
public String parseExample() throws IOException, SAXException, TikaException
ParseContext
,
make sure to set a Parser
or else embedded content will not be
parsed.IOException
SAXException
TikaException
public String parseNoEmbeddedExample() throws IOException, SAXException, TikaException
ParseContext
that does not contain a
Parser
.IOException
SAXException
TikaException
public String parseEmbeddedExample() throws IOException, SAXException, TikaException
Parser
in the ParseContext
.IOException
SAXException
TikaException
public List<Metadata> recursiveParserWrapperExample() throws IOException, SAXException, TikaException
The "content" format is determined by the ContentHandlerFactory, and
the content is stored in RecursiveParserWrapper.TIKA_CONTENT
The drawback to the RecursiveParserWrapper is that it caches metadata and contents in memory. This should not be used on files whose contents are too big to be handled in memory.
IOException
SAXException
TikaException
public String serializedRecursiveParserWrapperExample() throws IOException, SAXException, TikaException
JsonMetadataList
.
That class also includes a deserializer to convert from JSON
back to a ListThis functionality is also available in tika-app's GUI, and with the -J option on tika-app's commandline. For tika-server users, there is the "rmeta" service that will return this format.
IOException
SAXException
TikaException
Copyright © 2007-2015 The Apache Software Foundation. All Rights Reserved.