Class FileResourceConsumer

    • Field Detail

      • LOG

        protected static final org.slf4j.Logger LOG
      • TIMED_OUT

        public static String TIMED_OUT
      • OOM

        public static String OOM
      • IO_IS

        public static String IO_IS
      • IO_OS

        public static String IO_OS
      • PARSE_ERR

        public static String PARSE_ERR
      • PARSE_EX

        public static String PARSE_EX
      • ELAPSED_MILLIS

        public static String ELAPSED_MILLIS
    • Method Detail

      • processFileResource

        public abstract boolean processFileResource​(FileResource fileResource)
        Main piece of code that needs to be implemented. Clients are responsible for closing streams and handling the exceptions that they'd like to handle.

        Unchecked throwables can be thrown past this, of course. When an unchecked throwable is thrown, this logs the error, and then rethrows the exception. Clients/subclasses should make sure to catch and handle everything they can.

        The design goal is that the whole process should close up and shutdown soon after an unchecked exception or error is thrown.

        Make sure to call incrementHandledExceptions() appropriately in your implementation of this method.

        Parameters:
        fileResource - resource to process
        Returns:
        whether or not a file was successfully processed
      • incrementHandledExceptions

        protected void incrementHandledExceptions()
        Make sure to call this appropriately!
      • isStillActive

        public boolean isStillActive()
        Returns whether or not the consumer is still could process a file or is still processing a file (ACTIVELY_CONSUMING or ASKED_TO_SHUTDOWN)
        Returns:
        whether this consumer is still active
      • pleaseShutdown

        public void pleaseShutdown()
        This politely asks the consumer to shutdown. Before processing another file, the consumer will check to see if it has been asked to terminate.

        This offers another method for politely requesting that a FileResourceConsumer stop processing besides passing it PoisonFileResource.

      • getCurrentFile

        public org.apache.tika.batch.FileStarted getCurrentFile()
        Returns the name and start time of a file that is currently being processed. If no file is currently being processed, this will return null.
        Returns:
        FileStarted or null
      • getNumResourcesConsumed

        public int getNumResourcesConsumed()
      • getNumHandledExceptions

        public int getNumHandledExceptions()
      • checkForTimedOutMillis

        public org.apache.tika.batch.FileStarted checkForTimedOutMillis​(long staleThresholdMillis)
        Checks to see if the currentFile being processed (if there is one) should be timed out (still being worked on after staleThresholdMillis).

        If the consumer should be timed out, this will return the currentFile and set the state to TIMED_OUT.

        If the consumer was already timed out earlier or is not processing a file or has been working on a file for less than #staleThresholdMillis, then this will return null.

        Parameters:
        staleThresholdMillis - threshold to determine whether the consumer has gone stale.
        Returns:
        null or the file started that triggered the stale condition
      • getXMLifiedLogMsg

        protected String getXMLifiedLogMsg​(String type,
                                           String resourceId,
                                           Throwable t,
                                           String... attrs)
        Use this for structured output that captures resourceId and other attributes.
        Parameters:
        type - entity name for exception
        resourceId - resourceId string
        t - throwable can be null
        attrs - (array of key0, value0, key1, value1, etc.)
      • close

        protected void close​(Closeable closeable)
      • flushAndClose

        protected void flushAndClose​(Closeable closeable)
      • parse

        protected void parse​(String resourceId,
                             Parser parser,
                             InputStream is,
                             ContentHandler handler,
                             Metadata m,
                             ParseContext parseContext)
                      throws Throwable
        Utility method to handle logging equivalently among all implementing classes. Use, override or avoid as desired.
        Parameters:
        resourceId - resourceId
        parser - parser to use
        is - inputStream (will be closed by this method!)
        handler - handler for the content
        m - metadata
        parseContext - parse context
        Throws:
        Throwable - (logs and then throws whatever was thrown (if anything)