org.apache.tika.parser.html
Class IdentityHtmlMapper
java.lang.Object
org.apache.tika.parser.html.IdentityHtmlMapper
- All Implemented Interfaces:
- HtmlMapper
public class IdentityHtmlMapper
- extends java.lang.Object
- implements HtmlMapper
Alternative HTML mapping rules that pass the input HTML as-is without any
modifications.
- Since:
- Apache Tika 0.8
Method Summary |
boolean |
isDiscardElement(java.lang.String name)
Checks whether all content within the given HTML element should be
discarded instead of including it in the parse output. |
java.lang.String |
mapSafeAttribute(java.lang.String elementName,
java.lang.String attributeName)
Maps "safe" HTML attribute names to semantic XHTML equivalents. |
java.lang.String |
mapSafeElement(java.lang.String name)
Maps "safe" HTML element names to semantic XHTML equivalents. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
INSTANCE
public static final HtmlMapper INSTANCE
IdentityHtmlMapper
public IdentityHtmlMapper()
isDiscardElement
public boolean isDiscardElement(java.lang.String name)
- Description copied from interface:
HtmlMapper
- Checks whether all content within the given HTML element should be
discarded instead of including it in the parse output.
- Specified by:
isDiscardElement
in interface HtmlMapper
- Parameters:
name
- HTML element name (upper case)
- Returns:
true
if content inside the named element
should be ignored, false
otherwise
mapSafeAttribute
public java.lang.String mapSafeAttribute(java.lang.String elementName,
java.lang.String attributeName)
- Description copied from interface:
HtmlMapper
- Maps "safe" HTML attribute names to semantic XHTML equivalents. If the
given attribute is unknown or deemed unsafe for inclusion in the parse
output, then this method returns
null
and the attribute
will be ignored. This method assumes that the element name
is valid and normalised.
- Specified by:
mapSafeAttribute
in interface HtmlMapper
- Parameters:
elementName
- HTML element name (lower case)attributeName
- HTML attribute name (lower case)
- Returns:
- XHTML attribute name (lower case), or
null
if the element is unsafe
mapSafeElement
public java.lang.String mapSafeElement(java.lang.String name)
- Description copied from interface:
HtmlMapper
- Maps "safe" HTML element names to semantic XHTML equivalents. If the
given element is unknown or deemed unsafe for inclusion in the parse
output, then this method returns
null
and the element
will be ignored but the content inside it is still processed. See
the HtmlMapper.isDiscardElement(String)
method for a way to discard
the entire contents of an element.
- Specified by:
mapSafeElement
in interface HtmlMapper
- Parameters:
name
- HTML element name (upper case)
- Returns:
- XHTML element name (lower case), or
null
if the element is unsafe
Copyright © 2007-2010 The Apache Software Foundation. All Rights Reserved.