public class BoilerpipeContentHandler
extends de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
ContentHandler
object passed to
HtmlParser#parse(java.io.InputStream, ContentHandler, Metadata,
org.apache.tika.parser.ParseContext)
Constructor and Description |
---|
BoilerpipeContentHandler(ContentHandler delegate)
Creates a new boilerpipe-based content extractor, using the
DefaultExtractor extraction rules and "delegate" as the content handler. |
BoilerpipeContentHandler(ContentHandler delegate,
de.l3s.boilerpipe.BoilerpipeExtractor extractor)
Creates a new boilerpipe-based content extractor, using the given
extraction rules.
|
BoilerpipeContentHandler(Writer writer)
Creates a content handler that writes XHTML body character events to
the given writer.
|
Modifier and Type | Method and Description |
---|---|
void |
characters(char[] chars,
int offset,
int length) |
void |
endDocument() |
void |
endElement(String uri,
String localName,
String qName) |
de.l3s.boilerpipe.document.TextDocument |
getTextDocument()
Retrieves the built TextDocument
|
boolean |
isIncludeMarkup() |
void |
setIncludeMarkup(boolean includeMarkup) |
void |
startDocument() |
void |
startElement(String uri,
String localName,
String qName,
Attributes atts) |
void |
startPrefixMapping(String prefix,
String uri) |
public BoilerpipeContentHandler(ContentHandler delegate)
DefaultExtractor
extraction rules and "delegate" as the content handler.delegate
- The ContentHandler
objectpublic BoilerpipeContentHandler(Writer writer)
writer
- writerpublic BoilerpipeContentHandler(ContentHandler delegate, de.l3s.boilerpipe.BoilerpipeExtractor extractor)
delegate
- The ContentHandler
objectextractor
- Extraction rules to use, e.g. ArticleExtractor
public boolean isIncludeMarkup()
public void setIncludeMarkup(boolean includeMarkup)
public de.l3s.boilerpipe.document.TextDocument getTextDocument()
public void startDocument() throws SAXException
startDocument
in interface ContentHandler
startDocument
in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
SAXException
public void startPrefixMapping(String prefix, String uri) throws SAXException
startPrefixMapping
in interface ContentHandler
startPrefixMapping
in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
SAXException
public void startElement(String uri, String localName, String qName, Attributes atts) throws SAXException
startElement
in interface ContentHandler
startElement
in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
SAXException
public void characters(char[] chars, int offset, int length) throws SAXException
characters
in interface ContentHandler
characters
in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
SAXException
public void endElement(String uri, String localName, String qName) throws SAXException
endElement
in interface ContentHandler
endElement
in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
SAXException
public void endDocument() throws SAXException
endDocument
in interface ContentHandler
endDocument
in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
SAXException
Copyright © 2007–2023 The Apache Software Foundation. All rights reserved.