Class AbstractDBParser

java.lang.Object
org.apache.tika.parser.jdbc.AbstractDBParser
All Implemented Interfaces:
Serializable, Parser
Direct Known Subclasses:
SQLite3DBParser

public abstract class AbstractDBParser extends Object implements Parser
Abstract class that handles iterating through tables within a database.
See Also:
  • Constructor Details

    • AbstractDBParser

      public AbstractDBParser()
  • Method Details

    • getSupportedTypes

      public Set<MediaType> getSupportedTypes(ParseContext context)
      Description copied from interface: Parser
      Returns the set of media types supported by this parser when used with the given parse context.
      Specified by:
      getSupportedTypes in interface Parser
      Parameters:
      context - parse context
      Returns:
      immutable set of media types
    • parse

      public void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException
      Description copied from interface: Parser
      Parses a document stream into a sequence of XHTML SAX events. Fills in related document metadata in the given metadata object.

      The given document stream is consumed but not closed by this method. The responsibility to close the stream remains on the caller.

      Information about the parsing context can be passed in the context parameter. See the parser implementations for the kinds of context information they expect.

      Specified by:
      parse in interface Parser
      Parameters:
      stream - the document stream (input)
      handler - handler for the XHTML SAX events (output)
      metadata - document metadata (input and output)
      context - parse context
      Throws:
      IOException - if the document stream could not be read
      SAXException - if the SAX events could not be processed
      TikaException - if the document could not be parsed
    • extractMetadata

      protected void extractMetadata(Connection connection, Metadata metadata)
      This is called before parsing the tables to extract metadata from the db, if any. Override this for db specific metadata. This implementation is a no-op
      Parameters:
      connection -
      metadata -
    • close

      protected void close() throws SQLException, IOException
      Override this for any special handling of closing the connection.
      Throws:
      SQLException
      IOException
    • getConnection

      protected Connection getConnection(InputStream stream, Metadata metadata, ParseContext context) throws IOException, TikaException
      Override this for special configuration of the connection, such as limiting the number of rows to be held in memory.
      Parameters:
      stream - stream to use
      metadata - metadata that could be used in parameterizing the connection
      context - parsecontext that could be used in parameterizing the connection
      Returns:
      connection
      Throws:
      IOException
      TikaException
    • getConnectionString

      protected abstract String getConnectionString(InputStream stream, Metadata metadata, ParseContext parseContext) throws IOException
      Implement for db specific connection information, e.g. "jdbc:sqlite:/docs/mydb.db"

      Include any optimization settings, user name, password, etc.

      Parameters:
      stream - stream for processing
      metadata - metadata might be useful in determining connection info
      parseContext - context to use to help create connectionString
      Returns:
      connection string to be used by getConnection(java.io.InputStream, org.apache.tika.metadata.Metadata, org.apache.tika.parser.ParseContext).
      Throws:
      IOException
    • getJDBCClassName

      protected abstract String getJDBCClassName()
      JDBC class name, e.g. org.sqlite.JDBC
      Returns:
      jdbc class name
    • getTableNames

      protected abstract List<String> getTableNames(Connection connection, Metadata metadata, ParseContext context) throws SQLException
      Returns the names of the tables to process
      Parameters:
      connection - Connection to use to make the sql call(s) to get the names of the tables
      metadata - Metadata to use (potentially) in decision about which tables to extract
      context - ParseContext to use (potentially) in decision about which tables to extract
      Returns:
      Throws:
      SQLException
    • getTableReader

      @Deprecated protected abstract JDBCTableReader getTableReader(Connection connection, String tableName, ParseContext parseContext)
      Given a connection and a table name, return the JDBCTableReader for this db.
      Parameters:
      connection -
      tableName -
      Returns:
      a reader
    • getTableReader

      protected abstract JDBCTableReader getTableReader(Connection connection, String tableName, EmbeddedDocumentUtil embeddedDocumentUtil)
      Given a connection and a table name, return the JDBCTableReader for this db.
      Parameters:
      connection -
      tableName -
      embeddedDocumentUtil - embedded doc util
      Returns: