public class BoilerpipeContentHandler
extends de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler
ContentHandler object passed to
HtmlParser#parse(java.io.InputStream, ContentHandler, Metadata,
org.apache.tika.parser.ParseContext)| Constructor and Description |
|---|
BoilerpipeContentHandler(ContentHandler delegate)
Creates a new boilerpipe-based content extractor, using the
DefaultExtractor extraction rules and "delegate" as the content handler. |
BoilerpipeContentHandler(ContentHandler delegate,
de.l3s.boilerpipe.BoilerpipeExtractor extractor)
Creates a new boilerpipe-based content extractor, using the given
extraction rules.
|
BoilerpipeContentHandler(Writer writer)
Creates a content handler that writes XHTML body character events to
the given writer.
|
| Modifier and Type | Method and Description |
|---|---|
void |
characters(char[] chars,
int offset,
int length) |
void |
endDocument() |
void |
endElement(String uri,
String localName,
String qName) |
de.l3s.boilerpipe.document.TextDocument |
getTextDocument()
Retrieves the built TextDocument
|
boolean |
isIncludeMarkup() |
void |
setIncludeMarkup(boolean includeMarkup) |
void |
startDocument() |
void |
startElement(String uri,
String localName,
String qName,
Attributes atts) |
void |
startPrefixMapping(String prefix,
String uri) |
public BoilerpipeContentHandler(ContentHandler delegate)
DefaultExtractor extraction rules and "delegate" as the content handler.delegate - The ContentHandler objectpublic BoilerpipeContentHandler(Writer writer)
writer - writerpublic BoilerpipeContentHandler(ContentHandler delegate, de.l3s.boilerpipe.BoilerpipeExtractor extractor)
delegate - The ContentHandler objectextractor - Extraction rules to use, e.g. ArticleExtractorpublic boolean isIncludeMarkup()
public void setIncludeMarkup(boolean includeMarkup)
public de.l3s.boilerpipe.document.TextDocument getTextDocument()
public void startDocument()
throws SAXException
startDocument in interface ContentHandlerstartDocument in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandlerSAXExceptionpublic void startPrefixMapping(String prefix, String uri) throws SAXException
startPrefixMapping in interface ContentHandlerstartPrefixMapping in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandlerSAXExceptionpublic void startElement(String uri, String localName, String qName, Attributes atts) throws SAXException
startElement in interface ContentHandlerstartElement in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandlerSAXExceptionpublic void characters(char[] chars,
int offset,
int length)
throws SAXException
characters in interface ContentHandlercharacters in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandlerSAXExceptionpublic void endElement(String uri, String localName, String qName) throws SAXException
endElement in interface ContentHandlerendElement in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandlerSAXExceptionpublic void endDocument()
throws SAXException
endDocument in interface ContentHandlerendDocument in class de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandlerSAXExceptionCopyright © 2007–2022 The Apache Software Foundation. All rights reserved.