public class ForkParser extends AbstractParser implements Closeable
| Constructor and Description | 
|---|
| ForkParser() | 
| ForkParser(ClassLoader loader) | 
| ForkParser(ClassLoader loader,
          Parser parser) | 
| ForkParser(Path tikaBin,
          ParserFactoryFactory factoryFactory)If you have a directory with, say, tike-app.jar and you want the
 forked process/server to build a parser
 and run it from that -- so that you can keep all of those dependencies out of
 your client code, use this initializer. | 
| ForkParser(Path tikaBin,
          ParserFactoryFactory parserFactoryFactory,
          ClassLoader classLoader)EXPERT | 
| Modifier and Type | Method and Description | 
|---|---|
| void | close() | 
| String | getJavaCommand()Deprecated. 
 since 1.8 | 
| List<String> | getJavaCommandAsList()Returns the command used to start the forked server process. | 
| int | getPoolSize()Returns the size of the process pool. | 
| Set<MediaType> | getSupportedTypes(ParseContext context)Returns the set of media types supported by this parser when used
 with the given parse context. | 
| void | parse(InputStream stream,
     ContentHandler handler,
     Metadata metadata,
     ParseContext context)This sends the objects to the server for parsing, and the server via
 the proxies acts on the handler as if it were updating it directly. | 
| void | setJavaCommand(List<String> java)Sets the command used to start the forked server process. | 
| void | setJavaCommand(String java)Deprecated. 
 since 1.8 | 
| void | setMaxFilesProcessedPerServer(int maxFilesProcessedPerClient)If there is a slowly building memory leak in one of the parsers,
 it is useful to set a limit on the number of files processed
 by a server before it is shutdown and restarted. | 
| void | setPoolSize(int poolSize)Sets the size of the process pool. | 
| void | setServerParseTimeoutMillis(long serverParseTimeoutMillis)The maximum amount of time allowed for the server to try to parse a file. | 
| void | setServerPulseMillis(long serverPulseMillis)The amount of time in milliseconds that the server
 should wait before checking to see if the parse has timed out
 or if the wait has timed out
 The default is 5 seconds. | 
| void | setServerWaitTimeoutMillis(long serverWaitTimeoutMillis)The maximum amount of time allowed for the server to wait for a new request to parse
 a file. | 
parsepublic ForkParser(Path tikaBin, ParserFactoryFactory factoryFactory)
tikaBin - directory containing the tika-app.jar or similar --
                       full jar including tika-core and all
                       desired parsers and dependenciesfactoryFactory - public ForkParser(Path tikaBin, ParserFactoryFactory parserFactoryFactory, ClassLoader classLoader)
tikaBin - directory containing the tika-app.jar or similar
                             -- full jar including tika-core and all
                             desired parsers and dependenciesparserFactoryFactory - -- the factory to use to generate the parser factory
                             in the forked process/serverclassLoader - to use for all classes besides the parser in the
                             forked process/serverpublic ForkParser(ClassLoader loader, Parser parser)
loader - The ClassLoader to useparser - the parser to delegate to. This one cannot be another ForkParserpublic ForkParser(ClassLoader loader)
public ForkParser()
public int getPoolSize()
public void setPoolSize(int poolSize)
poolSize - process pool size@Deprecated public String getJavaCommand()
getJavaCommandAsList()public void setJavaCommand(List<String> java)
java - java command line@Deprecated public void setJavaCommand(String java)
java - java command linesetJavaCommand(List)public List<String> getJavaCommandAsList()
public Set<MediaType> getSupportedTypes(ParseContext context)
ParsergetSupportedTypes in interface Parsercontext - parse contextpublic void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException
 If using a RecursiveParserWrapper, there are two options:
 
RecursiveParserWrapperHandler,
              and the server will proxy back the data as best it can[0].AbstractRecursiveParserWrapperHandler
              and the server will act on the class but not proxy back the data.  This
              can be used, for example, if all you want to do is write to disc, extend
              AbstractRecursiveParserWrapperHandler to write to disc when
             AbstractRecursiveParserWrapperHandler.endDocument(ContentHandler,
             Metadata)
              is called, and the server will take care of the writing via the handler.
     NOTE:[0] "the server will proxy back the data as best it can".
     If the handler implements Serializable and is actually serializable, the
     server will send it and the
     Metadata back upon
     endEmbeddedDocument(ContentHandler, Metadata)
     or endEmbeddedDocument(ContentHandler, Metadata).
     If the handler does not implement Serializable or if there is a
     NotSerializableException thrown during serialization, the server will
     call ContentHandler#toString() on the ContentHandler and set that value with the
     TikaCoreProperties.TIKA_CONTENT key and then
     serialize and proxy that data back.
 
parse in interface Parserstream - the document stream (input)handler - handler for the XHTML SAX events (output)metadata - document metadata (input and output)context - parse contextIOExceptionSAXExceptionTikaExceptionpublic void close()
close in interface Closeableclose in interface AutoCloseablepublic void setServerPulseMillis(long serverPulseMillis)
serverPulseMillis - milliseconds to sleep before checking if there has been any activitypublic void setServerParseTimeoutMillis(long serverParseTimeoutMillis)
serverParseTimeoutMillis - public void setServerWaitTimeoutMillis(long serverWaitTimeoutMillis)
serverWaitTimeoutMillis - public void setMaxFilesProcessedPerServer(int maxFilesProcessedPerClient)
maxFilesProcessedPerClient - maximum number of files that a server can handle
                                   before the parser shuts down a client and creates
                                   a new process. If set to -1, the server is never restarted
                                   because of the number of files handled.Copyright © 2007–2022 The Apache Software Foundation. All rights reserved.