Interface | Description |
---|---|
HtmlMapper |
HTML mapper used to make incoming HTML documents easier to handle by
Tika clients.
|
Class | Description |
---|---|
BoilerpipeContentHandler |
Uses the boilerpipe
library to automatically extract the main content from a web page.
|
DefaultHtmlMapper |
The default HTML mapping rules in Tika.
|
HtmlEncodingDetector |
Character encoding detector for determining the character encoding of a
HTML document based on the potential charset parameter found in a
Content-Type http-equiv meta tag somewhere near the beginning.
|
HtmlParser |
HTML parser.
|
IdentityHtmlMapper |
Alternative HTML mapping rules that pass the input HTML as-is without any
modifications.
|
Copyright © 2007-2015 The Apache Software Foundation. All Rights Reserved.