Package org.apache.tika.parser.html
Class IdentityHtmlMapper
- java.lang.Object
-
- org.apache.tika.parser.html.IdentityHtmlMapper
-
- All Implemented Interfaces:
HtmlMapper
public class IdentityHtmlMapper extends Object implements HtmlMapper
Alternative HTML mapping rules that pass the input HTML as-is without any modifications.- Since:
- Apache Tika 0.8
-
-
Field Summary
Fields Modifier and Type Field Description static HtmlMapper
INSTANCE
-
Constructor Summary
Constructors Constructor Description IdentityHtmlMapper()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
isDiscardElement(String name)
Checks whether all content within the given HTML element should be discarded instead of including it in the parse output.String
mapSafeAttribute(String elementName, String attributeName)
Maps "safe" HTML attribute names to semantic XHTML equivalents.String
mapSafeElement(String name)
Maps "safe" HTML element names to semantic XHTML equivalents.
-
-
-
Field Detail
-
INSTANCE
public static final HtmlMapper INSTANCE
-
-
Method Detail
-
isDiscardElement
public boolean isDiscardElement(String name)
Description copied from interface:HtmlMapper
Checks whether all content within the given HTML element should be discarded instead of including it in the parse output.- Specified by:
isDiscardElement
in interfaceHtmlMapper
- Parameters:
name
- HTML element name (upper case)- Returns:
true
if content inside the named element should be ignored,false
otherwise
-
mapSafeAttribute
public String mapSafeAttribute(String elementName, String attributeName)
Description copied from interface:HtmlMapper
Maps "safe" HTML attribute names to semantic XHTML equivalents. If the given attribute is unknown or deemed unsafe for inclusion in the parse output, then this method returnsnull
and the attribute will be ignored. This method assumes that the element name is valid and normalised.- Specified by:
mapSafeAttribute
in interfaceHtmlMapper
- Parameters:
elementName
- HTML element name (lower case)attributeName
- HTML attribute name (lower case)- Returns:
- XHTML attribute name (lower case), or
null
if the element is unsafe
-
mapSafeElement
public String mapSafeElement(String name)
Description copied from interface:HtmlMapper
Maps "safe" HTML element names to semantic XHTML equivalents. If the given element is unknown or deemed unsafe for inclusion in the parse output, then this method returnsnull
and the element will be ignored but the content inside it is still processed. See theHtmlMapper.isDiscardElement(String)
method for a way to discard the entire contents of an element.- Specified by:
mapSafeElement
in interfaceHtmlMapper
- Parameters:
name
- HTML element name (upper case)- Returns:
- XHTML element name (lower case), or
null
if the element is unsafe
-
-