Package org.apache.tika.example
Class ContentHandlerExample
- java.lang.Object
 - 
- org.apache.tika.example.ContentHandlerExample
 
 
- 
public class ContentHandlerExample extends Object
Examples of using different Content Handlers to get different parts of the file's contents 
- 
- 
Field Summary
Fields Modifier and Type Field Description protected intMAXIMUM_TEXT_CHUNK_SIZE 
- 
Constructor Summary
Constructors Constructor Description ContentHandlerExample() 
- 
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description StringparseBodyToHTML()Example of extracting just the body as HTML, without the head part, as a stringStringparseOnePartToHTML()Example of extracting just one part of the document's body, as HTML as a string, excluding the restStringparseToHTML()Example of extracting the contents as HTML, as a string.StringparseToPlainText()Example of extracting the plain text of the contents.List<String>parseToPlainTextChunks()Example of extracting the plain text in chunks, with each chunk of no more than a certain maximum size 
 - 
 
- 
- 
Field Detail
- 
MAXIMUM_TEXT_CHUNK_SIZE
protected final int MAXIMUM_TEXT_CHUNK_SIZE
- See Also:
 - Constant Field Values
 
 
 - 
 
- 
Method Detail
- 
parseToPlainText
public String parseToPlainText() throws IOException, SAXException, TikaException
Example of extracting the plain text of the contents. Will return only the "body" part of the document- Throws:
 IOExceptionSAXExceptionTikaException
 
- 
parseToHTML
public String parseToHTML() throws IOException, SAXException, TikaException
Example of extracting the contents as HTML, as a string.- Throws:
 IOExceptionSAXExceptionTikaException
 
- 
parseBodyToHTML
public String parseBodyToHTML() throws IOException, SAXException, TikaException
Example of extracting just the body as HTML, without the head part, as a string- Throws:
 IOExceptionSAXExceptionTikaException
 
- 
parseOnePartToHTML
public String parseOnePartToHTML() throws IOException, SAXException, TikaException
Example of extracting just one part of the document's body, as HTML as a string, excluding the rest- Throws:
 IOExceptionSAXExceptionTikaException
 
- 
parseToPlainTextChunks
public List<String> parseToPlainTextChunks() throws IOException, SAXException, TikaException
Example of extracting the plain text in chunks, with each chunk of no more than a certain maximum size- Throws:
 IOExceptionSAXExceptionTikaException
 
 - 
 
 -