public class ForkParser extends AbstractParser implements Closeable
Constructor and Description |
---|
ForkParser() |
ForkParser(ClassLoader loader) |
ForkParser(ClassLoader loader,
Parser parser) |
ForkParser(Path tikaBin,
ParserFactoryFactory factoryFactory)
If you have a directory with, say, tike-app.jar and you want the
forked process/server to build a parser
and run it from that -- so that you can keep all of those dependencies out of
your client code, use this initializer.
|
ForkParser(Path tikaBin,
ParserFactoryFactory parserFactoryFactory,
ClassLoader classLoader)
EXPERT
|
Modifier and Type | Method and Description |
---|---|
void |
close() |
String |
getJavaCommand()
Deprecated.
since 1.8
|
List<String> |
getJavaCommandAsList()
Returns the command used to start the forked server process.
|
int |
getPoolSize()
Returns the size of the process pool.
|
Set<MediaType> |
getSupportedTypes(ParseContext context)
Returns the set of media types supported by this parser when used
with the given parse context.
|
void |
parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
This sends the objects to the server for parsing, and the server via
the proxies acts on the handler as if it were updating it directly.
|
void |
setJavaCommand(List<String> java)
Sets the command used to start the forked server process.
|
void |
setJavaCommand(String java)
Deprecated.
since 1.8
|
void |
setMaxFilesProcessedPerServer(int maxFilesProcessedPerClient)
If there is a slowly building memory leak in one of the parsers,
it is useful to set a limit on the number of files processed
by a server before it is shutdown and restarted.
|
void |
setPoolSize(int poolSize)
Sets the size of the process pool.
|
void |
setServerParseTimeoutMillis(long serverParseTimeoutMillis)
The maximum amount of time allowed for the server to try to parse a file.
|
void |
setServerPulseMillis(long serverPulseMillis)
The amount of time in milliseconds that the server
should wait before checking to see if the parse has timed out
or if the wait has timed out
The default is 5 seconds.
|
void |
setServerWaitTimeoutMillis(long serverWaitTimeoutMillis)
The maximum amount of time allowed for the server to wait for a new request to parse
a file.
|
parse
public ForkParser(Path tikaBin, ParserFactoryFactory factoryFactory)
tikaBin
- directory containing the tika-app.jar or similar --
full jar including tika-core and all
desired parsers and dependenciesfactoryFactory
- public ForkParser(Path tikaBin, ParserFactoryFactory parserFactoryFactory, ClassLoader classLoader)
tikaBin
- directory containing the tika-app.jar or similar
-- full jar including tika-core and all
desired parsers and dependenciesparserFactoryFactory
- -- the factory to use to generate the parser factory
in the forked process/serverclassLoader
- to use for all classes besides the parser in the
forked process/serverpublic ForkParser(ClassLoader loader, Parser parser)
loader
- The ClassLoader to useparser
- the parser to delegate to. This one cannot be another ForkParserpublic ForkParser(ClassLoader loader)
public ForkParser()
public int getPoolSize()
public void setPoolSize(int poolSize)
poolSize
- process pool size@Deprecated public String getJavaCommand()
getJavaCommandAsList()
public void setJavaCommand(List<String> java)
java
- java command line@Deprecated public void setJavaCommand(String java)
java
- java command linesetJavaCommand(List)
public List<String> getJavaCommandAsList()
public Set<MediaType> getSupportedTypes(ParseContext context)
Parser
getSupportedTypes
in interface Parser
context
- parse contextpublic void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException
If using a RecursiveParserWrapper
, there are two options:
RecursiveParserWrapperHandler
,
and the server will proxy back the data as best it can[0].AbstractRecursiveParserWrapperHandler
and the server will act on the class but not proxy back the data. This
can be used, for example, if all you want to do is write to disc, extend
AbstractRecursiveParserWrapperHandler
to write to disc when
AbstractRecursiveParserWrapperHandler.endDocument(ContentHandler,
Metadata)
is called, and the server will take care of the writing via the handler.
NOTE:[0] "the server will proxy back the data as best it can".
If the handler implements Serializable and is actually serializable, the
server will send it and the
Metadata
back upon
endEmbeddedDocument(ContentHandler, Metadata)
or endEmbeddedDocument(ContentHandler, Metadata)
.
If the handler does not implement Serializable
or if there is a
NotSerializableException
thrown during serialization, the server will
call ContentHandler#toString()
on the ContentHandler and set that value with the
TikaCoreProperties.TIKA_CONTENT
key and then
serialize and proxy that data back.
parse
in interface Parser
stream
- the document stream (input)handler
- handler for the XHTML SAX events (output)metadata
- document metadata (input and output)context
- parse contextIOException
SAXException
TikaException
public void close()
close
in interface Closeable
close
in interface AutoCloseable
public void setServerPulseMillis(long serverPulseMillis)
serverPulseMillis
- milliseconds to sleep before checking if there has been any activitypublic void setServerParseTimeoutMillis(long serverParseTimeoutMillis)
serverParseTimeoutMillis
- public void setServerWaitTimeoutMillis(long serverWaitTimeoutMillis)
serverWaitTimeoutMillis
- public void setMaxFilesProcessedPerServer(int maxFilesProcessedPerClient)
maxFilesProcessedPerClient
- maximum number of files that a server can handle
before the parser shuts down a client and creates
a new process. If set to -1, the server is never restarted
because of the number of files handled.Copyright © 2007–2023 The Apache Software Foundation. All rights reserved.