Package org.apache.tika.example
Class ContentHandlerExample
- java.lang.Object
-
- org.apache.tika.example.ContentHandlerExample
-
public class ContentHandlerExample extends Object
Examples of using different Content Handlers to get different parts of the file's contents
-
-
Field Summary
Fields Modifier and Type Field Description protected int
MAXIMUM_TEXT_CHUNK_SIZE
-
Constructor Summary
Constructors Constructor Description ContentHandlerExample()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description String
parseBodyToHTML()
Example of extracting just the body as HTML, without the head part, as a stringString
parseOnePartToHTML()
Example of extracting just one part of the document's body, as HTML as a string, excluding the restString
parseToHTML()
Example of extracting the contents as HTML, as a string.String
parseToPlainText()
Example of extracting the plain text of the contents.List<String>
parseToPlainTextChunks()
Example of extracting the plain text in chunks, with each chunk of no more than a certain maximum size
-
-
-
Field Detail
-
MAXIMUM_TEXT_CHUNK_SIZE
protected final int MAXIMUM_TEXT_CHUNK_SIZE
- See Also:
- Constant Field Values
-
-
Method Detail
-
parseToPlainText
public String parseToPlainText() throws IOException, SAXException, TikaException
Example of extracting the plain text of the contents. Will return only the "body" part of the document- Throws:
IOException
SAXException
TikaException
-
parseToHTML
public String parseToHTML() throws IOException, SAXException, TikaException
Example of extracting the contents as HTML, as a string.- Throws:
IOException
SAXException
TikaException
-
parseBodyToHTML
public String parseBodyToHTML() throws IOException, SAXException, TikaException
Example of extracting just the body as HTML, without the head part, as a string- Throws:
IOException
SAXException
TikaException
-
parseOnePartToHTML
public String parseOnePartToHTML() throws IOException, SAXException, TikaException
Example of extracting just one part of the document's body, as HTML as a string, excluding the rest- Throws:
IOException
SAXException
TikaException
-
parseToPlainTextChunks
public List<String> parseToPlainTextChunks() throws IOException, SAXException, TikaException
Example of extracting the plain text in chunks, with each chunk of no more than a certain maximum size- Throws:
IOException
SAXException
TikaException
-
-