public class ToTextContentHandler extends DefaultHandler
As of Tika 1.20, this handler ignores content within <script> and <style> tags.
Constructor and Description |
---|
ToTextContentHandler()
Creates a content handler that writes character events
to an internal string buffer.
|
ToTextContentHandler(OutputStream stream)
Deprecated.
|
ToTextContentHandler(OutputStream stream,
String encoding)
Creates a content handler that writes character events to
the given output stream using the given encoding.
|
ToTextContentHandler(Writer writer)
Creates a content handler that writes character events to
the given writer.
|
Modifier and Type | Method and Description |
---|---|
void |
characters(char[] ch,
int start,
int length)
Writes the given characters to the given character stream.
|
void |
endDocument()
Flushes the character stream so that no characters are forgotten
in internal buffers.
|
void |
endElement(String uri,
String localName,
String qName) |
void |
ignorableWhitespace(char[] ch,
int start,
int length)
Writes the given ignorable characters to the given character stream.
|
void |
startElement(String uri,
String localName,
String qName,
Attributes atts) |
String |
toString()
Returns the contents of the internal string buffer where
all the received characters have been collected.
|
endPrefixMapping, error, fatalError, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startDocument, startPrefixMapping, unparsedEntityDecl, warning
public ToTextContentHandler(Writer writer)
writer
- writerpublic ToTextContentHandler(OutputStream stream)
ToTextContentHandler(Writer)
stream
- output streampublic ToTextContentHandler(OutputStream stream, String encoding) throws UnsupportedEncodingException
stream
- output streamencoding
- output encodingUnsupportedEncodingException
- if the encoding is unsupportedpublic ToTextContentHandler()
toString()
method to access the collected character content.public void characters(char[] ch, int start, int length) throws SAXException
characters
in interface ContentHandler
characters
in class DefaultHandler
SAXException
public void ignorableWhitespace(char[] ch, int start, int length) throws SAXException
characters(char[], int, int)
method.ignorableWhitespace
in interface ContentHandler
ignorableWhitespace
in class DefaultHandler
SAXException
public void endDocument() throws SAXException
endDocument
in interface ContentHandler
endDocument
in class DefaultHandler
SAXException
- if the stream can not be flushedpublic void startElement(String uri, String localName, String qName, Attributes atts) throws SAXException
startElement
in interface ContentHandler
startElement
in class DefaultHandler
SAXException
public void endElement(String uri, String localName, String qName) throws SAXException
endElement
in interface ContentHandler
endElement
in class DefaultHandler
SAXException
public String toString()
StringWriter
to the
other constructor.Copyright © 2007–2023 The Apache Software Foundation. All rights reserved.