Package | Description |
---|---|
org.apache.tika.parser.microsoft.chm |
Class and Description |
---|
ChmAccessor
Defines an accessor interface
|
ChmBlockInfo
A container that contains chm block information such as: i.
|
ChmCommons.EntryType
Represents entry types: uncompressed, compressed
|
ChmCommons.IntelState
Represents intel file states during decompression
|
ChmCommons.LzxState
Represents lzx states: started decoding, not started decoding
|
ChmDirectoryListingSet
Holds chm listing entries
|
ChmItsfHeader
The Header 0000: char[4] 'ITSF' 0004: DWORD 3 (Version number) 0008: DWORD
Total header length, including header section table and following data.
|
ChmItspHeader
Directory header The directory starts with a header; its format is as
follows: 0000: char[4] 'ITSP' 0004: DWORD Version number 1 0008: DWORD Length
of the directory header 000C: DWORD $0a (unknown) 0010: DWORD $1000 Directory
chunk size 0014: DWORD "Density" of quickref section, usually 2 0018: DWORD
Depth of the index tree - 1 there is no index, 2 if there is one level of
PMGI chunks 001C: DWORD Chunk number of root index chunk, -1 if there is none
(though at least one file has 0 despite there being no index chunk, probably
a bug) 0020: DWORD Chunk number of first PMGL (listing) chunk 0024: DWORD
Chunk number of last PMGL (listing) chunk 0028: DWORD -1 (unknown) 002C:
DWORD Number of directory chunks (total) 0030: DWORD Windows language ID
0034: GUID {5D02926A-212E-11D0-9DF9-00A0C922E6EC} 0044: DWORD $54 (This is
the length again) 0048: DWORD -1 (unknown) 004C: DWORD -1 (unknown) 0050:
DWORD -1 (unknown)
|
ChmLzxBlock
Decompresses a chm block.
|
ChmLzxcControlData
::DataSpace/Storage/
|
ChmLzxcResetTable
LZXC reset table For ensuring a decompression.
|
ChmLzxState |
ChmParsingException |
ChmPmgiHeader
Description Note: not always exists An index chunk has the following format:
0000: char[4] 'PMGI' 0004: DWORD Length of quickref/free area at end of
directory chunk 0008: Directory index entries (to quickref/free area) The
quickref area in an PMGI is the same as in an PMGL The format of a directory
index entry is as follows: BYTE: length of name BYTEs: name (UTF-8 encoded)
ENCINT: directory listing chunk which starts with name Encoded Integers aka
ENCINT An ENCINT is a variable-length integer.
|
ChmPmglHeader
Description There are two types of directory chunks -- index chunks, and
listing chunks.
|
DirectoryListingEntry
The format of a directory listing entry is as follows: BYTE: length of name
BYTEs: name (UTF-8 encoded) ENCINT: content section ENCINT: offset ENCINT:
length The offset is from the beginning of the content section the file is
in, after the section has been decompressed (if appropriate).
|
Copyright © 2007–2022 The Apache Software Foundation. All rights reserved.