org.apache.tika.parser
Class ParsingReader

java.lang.Object
  extended by java.io.Reader
      extended by org.apache.tika.parser.ParsingReader
All Implemented Interfaces:
java.io.Closeable, java.lang.Readable

public class ParsingReader
extends java.io.Reader

Reader for the text content from a given binary stream. This class uses a background parsing task with a Parser (AutoDetectParser by default) to parse the text content from a given input stream. The BodyContentHandler class and a pipe is used to convert the push-based SAX event stream to the pull-based character stream defined by the Reader interface.

Since:
Apache Tika 0.2

Field Summary
 
Fields inherited from class java.io.Reader
lock
 
Constructor Summary
ParsingReader(java.io.File file)
          Creates a reader for the text content of the given file.
ParsingReader(java.io.InputStream stream)
          Creates a reader for the text content of the given binary stream.
ParsingReader(java.io.InputStream stream, java.lang.String name)
          Creates a reader for the text content of the given binary stream with the given name.
ParsingReader(Parser parser, java.io.InputStream stream, Metadata metadata)
          Deprecated. This method will be removed in Apache Tika 1.0
ParsingReader(Parser parser, java.io.InputStream stream, Metadata metadata, java.util.concurrent.Executor executor)
          Deprecated. This method will be removed in Apache Tika 1.0
ParsingReader(Parser parser, java.io.InputStream stream, Metadata metadata, ParseContext context)
          Creates a reader for the text content of the given binary stream with the given document metadata.
ParsingReader(Parser parser, java.io.InputStream stream, Metadata metadata, ParseContext context, java.util.concurrent.Executor executor)
          Creates a reader for the text content of the given binary stream with the given document metadata.
 
Method Summary
 void close()
          Closes the read end of the pipe.
 int read(char[] cbuf, int off, int len)
          Reads parsed text from the pipe connected to the parsing thread.
 
Methods inherited from class java.io.Reader
mark, markSupported, read, read, read, ready, reset, skip
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ParsingReader

public ParsingReader(java.io.InputStream stream)
              throws java.io.IOException
Creates a reader for the text content of the given binary stream.

Parameters:
stream - binary stream
Throws:
java.io.IOException - if the document can not be parsed

ParsingReader

public ParsingReader(java.io.InputStream stream,
                     java.lang.String name)
              throws java.io.IOException
Creates a reader for the text content of the given binary stream with the given name.

Parameters:
stream - binary stream
name - document name
Throws:
java.io.IOException - if the document can not be parsed

ParsingReader

public ParsingReader(java.io.File file)
              throws java.io.FileNotFoundException,
                     java.io.IOException
Creates a reader for the text content of the given file.

Parameters:
file - file
Throws:
java.io.FileNotFoundException - if the given file does not exist
java.io.IOException - if the document can not be parsed

ParsingReader

public ParsingReader(Parser parser,
                     java.io.InputStream stream,
                     Metadata metadata,
                     ParseContext context)
              throws java.io.IOException
Creates a reader for the text content of the given binary stream with the given document metadata. The given parser is used for parsing. A new background thread is started for the parsing task.

Parameters:
parser - parser instance
stream - binary stream
metadata - document metadata
Throws:
java.io.IOException - if the document can not be parsed

ParsingReader

public ParsingReader(Parser parser,
                     java.io.InputStream stream,
                     Metadata metadata,
                     ParseContext context,
                     java.util.concurrent.Executor executor)
              throws java.io.IOException
Creates a reader for the text content of the given binary stream with the given document metadata. The given parser is used for the parsing task that is run with the given executor. The given executor must run the parsing task asynchronously in a separate thread, since the current thread must return to the caller that can then consume the parsed text through the Reader interface.

Parameters:
parser - parser instance
stream - binary stream
metadata - document metadata
context - parsing context
executor - executor for the parsing task
Throws:
java.io.IOException - if the document can not be parsed
Since:
Apache Tika 0.4

ParsingReader

public ParsingReader(Parser parser,
                     java.io.InputStream stream,
                     Metadata metadata)
              throws java.io.IOException
Deprecated. This method will be removed in Apache Tika 1.0

Throws:
java.io.IOException
See Also:
TIKA-275

ParsingReader

public ParsingReader(Parser parser,
                     java.io.InputStream stream,
                     Metadata metadata,
                     java.util.concurrent.Executor executor)
              throws java.io.IOException
Deprecated. This method will be removed in Apache Tika 1.0

Throws:
java.io.IOException
See Also:
TIKA-275
Method Detail

read

public int read(char[] cbuf,
                int off,
                int len)
         throws java.io.IOException
Reads parsed text from the pipe connected to the parsing thread. Fails if the parsing thread has thrown an exception.

Specified by:
read in class java.io.Reader
Parameters:
cbuf - character buffer
off - start offset within the buffer
len - maximum number of characters to read
Throws:
java.io.IOException - if the parsing thread has failed or if for some reason the pipe does not work properly

close

public void close()
           throws java.io.IOException
Closes the read end of the pipe. If the parsing thread is still running, next write to the pipe will fail and cause the thread to stop. Thus there is no need to explicitly terminate the thread.

Specified by:
close in interface java.io.Closeable
Specified by:
close in class java.io.Reader
Throws:
java.io.IOException - if the pipe can not be closed


Copyright © 2007-2011 The Apache Software Foundation. All Rights Reserved.