Package org.apache.tika.example
Class ContentHandlerExample
- java.lang.Object
- 
- org.apache.tika.example.ContentHandlerExample
 
- 
 public class ContentHandlerExample extends Object Examples of using different Content Handlers to get different parts of the file's contents
- 
- 
Field SummaryFields Modifier and Type Field Description protected intMAXIMUM_TEXT_CHUNK_SIZE
 - 
Constructor SummaryConstructors Constructor Description ContentHandlerExample()
 - 
Method SummaryAll Methods Instance Methods Concrete Methods Modifier and Type Method Description StringparseBodyToHTML()Example of extracting just the body as HTML, without the head part, as a stringStringparseOnePartToHTML()Example of extracting just one part of the document's body, as HTML as a string, excluding the restStringparseToHTML()Example of extracting the contents as HTML, as a string.StringparseToPlainText()Example of extracting the plain text of the contents.List<String>parseToPlainTextChunks()Example of extracting the plain text in chunks, with each chunk of no more than a certain maximum size
 
- 
- 
- 
Field Detail- 
MAXIMUM_TEXT_CHUNK_SIZEprotected final int MAXIMUM_TEXT_CHUNK_SIZE - See Also:
- Constant Field Values
 
 
- 
 - 
Method Detail- 
parseToPlainTextpublic String parseToPlainText() throws IOException, SAXException, TikaException Example of extracting the plain text of the contents. Will return only the "body" part of the document- Throws:
- IOException
- SAXException
- TikaException
 
 - 
parseToHTMLpublic String parseToHTML() throws IOException, SAXException, TikaException Example of extracting the contents as HTML, as a string.- Throws:
- IOException
- SAXException
- TikaException
 
 - 
parseBodyToHTMLpublic String parseBodyToHTML() throws IOException, SAXException, TikaException Example of extracting just the body as HTML, without the head part, as a string- Throws:
- IOException
- SAXException
- TikaException
 
 - 
parseOnePartToHTMLpublic String parseOnePartToHTML() throws IOException, SAXException, TikaException Example of extracting just one part of the document's body, as HTML as a string, excluding the rest- Throws:
- IOException
- SAXException
- TikaException
 
 - 
parseToPlainTextChunkspublic List<String> parseToPlainTextChunks() throws IOException, SAXException, TikaException Example of extracting the plain text in chunks, with each chunk of no more than a certain maximum size- Throws:
- IOException
- SAXException
- TikaException
 
 
- 
 
-