Class ParsingReader

java.lang.Object
java.io.Reader
org.apache.tika.parser.ParsingReader
All Implemented Interfaces:
Closeable, AutoCloseable, Readable

public class ParsingReader extends Reader
Reader for the text content from a given binary stream. This class uses a background parsing task with a Parser (AutoDetectParser by default) to parse the text content from a given input stream. The BodyContentHandler class and a pipe is used to convert the push-based SAX event stream to the pull-based character stream defined by the Reader interface.
Since:
Apache Tika 0.2
  • Constructor Details

    • ParsingReader

      public ParsingReader(InputStream stream) throws IOException
      Creates a reader for the text content of the given binary stream.
      Parameters:
      stream - binary stream
      Throws:
      IOException - if the document can not be parsed
    • ParsingReader

      public ParsingReader(InputStream stream, String name) throws IOException
      Creates a reader for the text content of the given binary stream with the given name.
      Parameters:
      stream - binary stream
      name - document name
      Throws:
      IOException - if the document can not be parsed
    • ParsingReader

      public ParsingReader(Path path) throws IOException
      Creates a reader for the text content of the file at the given path.
      Parameters:
      path - path
      Throws:
      FileNotFoundException - if the given file does not exist
      IOException - if the document can not be parsed
    • ParsingReader

      public ParsingReader(File file) throws FileNotFoundException, IOException
      Creates a reader for the text content of the given file.
      Parameters:
      file - file
      Throws:
      FileNotFoundException - if the given file does not exist
      IOException - if the document can not be parsed
      See Also:
    • ParsingReader

      public ParsingReader(Parser parser, InputStream stream, Metadata metadata, ParseContext context) throws IOException
      Creates a reader for the text content of the given binary stream with the given document metadata. The given parser is used for parsing. A new background thread is started for the parsing task.

      The created reader will be responsible for closing the given stream. The stream and any associated resources will be closed at or before the time when the close() method is called on this reader.

      Parameters:
      parser - parser instance
      stream - binary stream
      metadata - document metadata
      Throws:
      IOException - if the document can not be parsed
    • ParsingReader

      public ParsingReader(Parser parser, InputStream stream, Metadata metadata, ParseContext context, Executor executor) throws IOException
      Creates a reader for the text content of the given binary stream with the given document metadata. The given parser is used for the parsing task that is run with the given executor. The given executor must run the parsing task asynchronously in a separate thread, since the current thread must return to the caller that can then consume the parsed text through the Reader interface.

      The created reader will be responsible for closing the given stream. The stream and any associated resources will be closed at or before the time when the close() method is called on this reader.

      Parameters:
      parser - parser instance
      stream - binary stream
      metadata - document metadata
      context - parsing context
      executor - executor for the parsing task
      Throws:
      IOException - if the document can not be parsed
      Since:
      Apache Tika 0.4
  • Method Details

    • read

      public int read(char[] cbuf, int off, int len) throws IOException
      Reads parsed text from the pipe connected to the parsing thread. Fails if the parsing thread has thrown an exception.
      Specified by:
      read in class Reader
      Parameters:
      cbuf - character buffer
      off - start offset within the buffer
      len - maximum number of characters to read
      Throws:
      IOException - if the parsing thread has failed or if for some reason the pipe does not work properly
    • close

      public void close() throws IOException
      Closes the read end of the pipe. If the parsing thread is still running, next write to the pipe will fail and cause the thread to stop. Thus there is no need to explicitly terminate the thread.
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Specified by:
      close in class Reader
      Throws:
      IOException - if the pipe can not be closed