Package org.apache.tika.parser
Class ParsingReader
- java.lang.Object
- 
- java.io.Reader
- 
- org.apache.tika.parser.ParsingReader
 
 
- 
- All Implemented Interfaces:
- Closeable,- AutoCloseable,- Readable
 
 public class ParsingReader extends Reader Reader for the text content from a given binary stream. This class uses a background parsing task with aParser(AutoDetectParserby default) to parse the text content from a given input stream. TheBodyContentHandlerclass and a pipe is used to convert the push-based SAX event stream to the pull-based character stream defined by theReaderinterface.- Since:
- Apache Tika 0.2
 
- 
- 
Constructor SummaryConstructors Constructor Description ParsingReader(File file)Creates a reader for the text content of the given file.ParsingReader(InputStream stream)Creates a reader for the text content of the given binary stream.ParsingReader(InputStream stream, String name)Creates a reader for the text content of the given binary stream with the given name.ParsingReader(Path path)Creates a reader for the text content of the file at the given path.ParsingReader(Parser parser, InputStream stream, Metadata metadata, ParseContext context)Creates a reader for the text content of the given binary stream with the given document metadata.ParsingReader(Parser parser, InputStream stream, Metadata metadata, ParseContext context, Executor executor)Creates a reader for the text content of the given binary stream with the given document metadata.
 - 
Method SummaryAll Methods Instance Methods Concrete Methods Modifier and Type Method Description voidclose()Closes the read end of the pipe.intread(char[] cbuf, int off, int len)Reads parsed text from the pipe connected to the parsing thread.- 
Methods inherited from class java.io.Readermark, markSupported, nullReader, read, read, read, ready, reset, skip, transferTo
 
- 
 
- 
- 
- 
Constructor Detail- 
ParsingReaderpublic ParsingReader(InputStream stream) throws IOException Creates a reader for the text content of the given binary stream.- Parameters:
- stream- binary stream
- Throws:
- IOException- if the document can not be parsed
 
 - 
ParsingReaderpublic ParsingReader(InputStream stream, String name) throws IOException Creates a reader for the text content of the given binary stream with the given name.- Parameters:
- stream- binary stream
- name- document name
- Throws:
- IOException- if the document can not be parsed
 
 - 
ParsingReaderpublic ParsingReader(Path path) throws IOException Creates a reader for the text content of the file at the given path.- Parameters:
- path- path
- Throws:
- FileNotFoundException- if the given file does not exist
- IOException- if the document can not be parsed
 
 - 
ParsingReaderpublic ParsingReader(File file) throws FileNotFoundException, IOException Creates a reader for the text content of the given file.- Parameters:
- file- file
- Throws:
- FileNotFoundException- if the given file does not exist
- IOException- if the document can not be parsed
- See Also:
- ParsingReader(Path)
 
 - 
ParsingReaderpublic ParsingReader(Parser parser, InputStream stream, Metadata metadata, ParseContext context) throws IOException Creates a reader for the text content of the given binary stream with the given document metadata. The given parser is used for parsing. A new background thread is started for the parsing task.The created reader will be responsible for closing the given stream. The stream and any associated resources will be closed at or before the time when the close()method is called on this reader.- Parameters:
- parser- parser instance
- stream- binary stream
- metadata- document metadata
- Throws:
- IOException- if the document can not be parsed
 
 - 
ParsingReaderpublic ParsingReader(Parser parser, InputStream stream, Metadata metadata, ParseContext context, Executor executor) throws IOException Creates a reader for the text content of the given binary stream with the given document metadata. The given parser is used for the parsing task that is run with the given executor. The given executor must run the parsing task asynchronously in a separate thread, since the current thread must return to the caller that can then consume the parsed text through theReaderinterface.The created reader will be responsible for closing the given stream. The stream and any associated resources will be closed at or before the time when the close()method is called on this reader.- Parameters:
- parser- parser instance
- stream- binary stream
- metadata- document metadata
- context- parsing context
- executor- executor for the parsing task
- Throws:
- IOException- if the document can not be parsed
- Since:
- Apache Tika 0.4
 
 
- 
 - 
Method Detail- 
readpublic int read(char[] cbuf, int off, int len) throws IOExceptionReads parsed text from the pipe connected to the parsing thread. Fails if the parsing thread has thrown an exception.- Specified by:
- readin class- Reader
- Parameters:
- cbuf- character buffer
- off- start offset within the buffer
- len- maximum number of characters to read
- Throws:
- IOException- if the parsing thread has failed or if for some reason the pipe does not work properly
 
 - 
closepublic void close() throws IOExceptionCloses the read end of the pipe. If the parsing thread is still running, next write to the pipe will fail and cause the thread to stop. Thus there is no need to explicitly terminate the thread.- Specified by:
- closein interface- AutoCloseable
- Specified by:
- closein interface- Closeable
- Specified by:
- closein class- Reader
- Throws:
- IOException- if the pipe can not be closed
 
 
- 
 
-