Package org.apache.tika.parser.jdbc
Class AbstractDBParser
java.lang.Object
org.apache.tika.parser.jdbc.AbstractDBParser
- All Implemented Interfaces:
Serializable
,Parser
- Direct Known Subclasses:
SQLite3DBParser
Abstract class that handles iterating through tables within a database.
- See Also:
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionprotected void
close()
Override this for any special handling of closing the connection.protected void
extractMetadata
(Connection connection, Metadata metadata) This is called before parsing the tables to extract metadata from the db, if any.protected Connection
getConnection
(InputStream stream, Metadata metadata, ParseContext context) Override this for special configuration of the connection, such as limiting the number of rows to be held in memory.protected abstract String
getConnectionString
(InputStream stream, Metadata metadata, ParseContext parseContext) Implement for db specific connection information, e.g.protected abstract String
JDBC class name, e.g. org.sqlite.JDBCgetSupportedTypes
(ParseContext context) Returns the set of media types supported by this parser when used with the given parse context.getTableNames
(Connection connection, Metadata metadata, ParseContext context) Returns the names of the tables to processprotected abstract JDBCTableReader
getTableReader
(Connection connection, String tableName, EmbeddedDocumentUtil embeddedDocumentUtil) Given a connection and a table name, return the JDBCTableReader for this db.protected abstract JDBCTableReader
getTableReader
(Connection connection, String tableName, ParseContext parseContext) Deprecated.void
parse
(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) Parses a document stream into a sequence of XHTML SAX events.
-
Constructor Details
-
AbstractDBParser
public AbstractDBParser()
-
-
Method Details
-
getSupportedTypes
Description copied from interface:Parser
Returns the set of media types supported by this parser when used with the given parse context.- Specified by:
getSupportedTypes
in interfaceParser
- Parameters:
context
- parse context- Returns:
- immutable set of media types
-
parse
public void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException Description copied from interface:Parser
Parses a document stream into a sequence of XHTML SAX events. Fills in related document metadata in the given metadata object.The given document stream is consumed but not closed by this method. The responsibility to close the stream remains on the caller.
Information about the parsing context can be passed in the context parameter. See the parser implementations for the kinds of context information they expect.
- Specified by:
parse
in interfaceParser
- Parameters:
stream
- the document stream (input)handler
- handler for the XHTML SAX events (output)metadata
- document metadata (input and output)context
- parse context- Throws:
IOException
- if the document stream could not be readSAXException
- if the SAX events could not be processedTikaException
- if the document could not be parsed
-
extractMetadata
This is called before parsing the tables to extract metadata from the db, if any. Override this for db specific metadata. This implementation is a no-op- Parameters:
connection
-metadata
-
-
close
Override this for any special handling of closing the connection.- Throws:
SQLException
IOException
-
getConnection
protected Connection getConnection(InputStream stream, Metadata metadata, ParseContext context) throws IOException, TikaException Override this for special configuration of the connection, such as limiting the number of rows to be held in memory.- Parameters:
stream
- stream to usemetadata
- metadata that could be used in parameterizing the connectioncontext
- parsecontext that could be used in parameterizing the connection- Returns:
- connection
- Throws:
IOException
TikaException
-
getConnectionString
protected abstract String getConnectionString(InputStream stream, Metadata metadata, ParseContext parseContext) throws IOException Implement for db specific connection information, e.g. "jdbc:sqlite:/docs/mydb.db" Include any optimization settings, user name, password, etc.- Parameters:
stream
- stream for processingmetadata
- metadata might be useful in determining connection infoparseContext
- context to use to help create connectionString- Returns:
- connection string to be used by
getConnection(java.io.InputStream, org.apache.tika.metadata.Metadata, org.apache.tika.parser.ParseContext)
. - Throws:
IOException
-
getJDBCClassName
JDBC class name, e.g. org.sqlite.JDBC- Returns:
- jdbc class name
-
getTableNames
protected abstract List<String> getTableNames(Connection connection, Metadata metadata, ParseContext context) throws SQLException Returns the names of the tables to process- Parameters:
connection
- Connection to use to make the sql call(s) to get the names of the tablesmetadata
- Metadata to use (potentially) in decision about which tables to extractcontext
- ParseContext to use (potentially) in decision about which tables to extract- Returns:
- Throws:
SQLException
-
getTableReader
@Deprecated protected abstract JDBCTableReader getTableReader(Connection connection, String tableName, ParseContext parseContext) Deprecated.Given a connection and a table name, return the JDBCTableReader for this db.- Parameters:
connection
-tableName
-- Returns:
- a reader
-
getTableReader
protected abstract JDBCTableReader getTableReader(Connection connection, String tableName, EmbeddedDocumentUtil embeddedDocumentUtil) Given a connection and a table name, return the JDBCTableReader for this db.- Parameters:
connection
-tableName
-embeddedDocumentUtil
- embedded doc util- Returns:
-
getTableReader(Connection, String, EmbeddedDocumentUtil)