Class PickBestTextEncodingParser

  • All Implemented Interfaces:
    Serializable, Parser

    public class PickBestTextEncodingParser
    extends AbstractMultipleParser
    Deprecated.
    Currently not suitable for real use, more a demo / prototype!
    Inspired by TIKA-1443 and https://wiki.apache.org/tika/CompositeParserDiscussion this tries several different text encodings, then does the real text parsing based on which is "best".

    The logic for "best" needs a lot of work!

    This is not recommended for actual production use... It is mostly to prove that the AbstractMultipleParser environment is sufficient to support this use-case

    TODO Implement proper "Junk" detection

    See Also:
    Serialized Form