Package org.apache.tika.parser.jdbc
Class AbstractDBParser
- java.lang.Object
-
- org.apache.tika.parser.jdbc.AbstractDBParser
-
- All Implemented Interfaces:
Serializable
,Parser
- Direct Known Subclasses:
SQLite3DBParser
public abstract class AbstractDBParser extends Object implements Parser
Abstract class that handles iterating through tables within a database.- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description AbstractDBParser()
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Deprecated Methods Modifier and Type Method Description protected void
close()
Override this for any special handling of closing the connection.protected void
extractMetadata(Connection connection, Metadata metadata)
This is called before parsing the tables to extract metadata from the db, if any.protected Connection
getConnection(InputStream stream, Metadata metadata, ParseContext context)
Override this for special configuration of the connection, such as limiting the number of rows to be held in memory.protected abstract String
getConnectionString(InputStream stream, Metadata metadata, ParseContext parseContext)
Implement for db specific connection information, e.g.protected abstract String
getJDBCClassName()
JDBC class name, e.g.Set<MediaType>
getSupportedTypes(ParseContext context)
Returns the set of media types supported by this parser when used with the given parse context.protected abstract List<String>
getTableNames(Connection connection, Metadata metadata, ParseContext context)
Returns the names of the tables to processprotected abstract JDBCTableReader
getTableReader(Connection connection, String tableName, EmbeddedDocumentUtil embeddedDocumentUtil)
Given a connection and a table name, return the JDBCTableReader for this db.protected abstract JDBCTableReader
getTableReader(Connection connection, String tableName, ParseContext parseContext)
Deprecated.void
parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context)
Parses a document stream into a sequence of XHTML SAX events.
-
-
-
Method Detail
-
getSupportedTypes
public Set<MediaType> getSupportedTypes(ParseContext context)
Description copied from interface:Parser
Returns the set of media types supported by this parser when used with the given parse context.- Specified by:
getSupportedTypes
in interfaceParser
- Parameters:
context
- parse context- Returns:
- immutable set of media types
-
parse
public void parse(InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException
Description copied from interface:Parser
Parses a document stream into a sequence of XHTML SAX events. Fills in related document metadata in the given metadata object.The given document stream is consumed but not closed by this method. The responsibility to close the stream remains on the caller.
Information about the parsing context can be passed in the context parameter. See the parser implementations for the kinds of context information they expect.
- Specified by:
parse
in interfaceParser
- Parameters:
stream
- the document stream (input)handler
- handler for the XHTML SAX events (output)metadata
- document metadata (input and output)context
- parse context- Throws:
IOException
- if the document stream could not be readSAXException
- if the SAX events could not be processedTikaException
- if the document could not be parsed
-
extractMetadata
protected void extractMetadata(Connection connection, Metadata metadata)
This is called before parsing the tables to extract metadata from the db, if any. Override this for db specific metadata. This implementation is a no-op- Parameters:
connection
-metadata
-
-
close
protected void close() throws SQLException, IOException
Override this for any special handling of closing the connection.- Throws:
SQLException
IOException
-
getConnection
protected Connection getConnection(InputStream stream, Metadata metadata, ParseContext context) throws IOException, TikaException
Override this for special configuration of the connection, such as limiting the number of rows to be held in memory.- Parameters:
stream
- stream to usemetadata
- metadata that could be used in parameterizing the connectioncontext
- parsecontext that could be used in parameterizing the connection- Returns:
- connection
- Throws:
IOException
TikaException
-
getConnectionString
protected abstract String getConnectionString(InputStream stream, Metadata metadata, ParseContext parseContext) throws IOException
Implement for db specific connection information, e.g. "jdbc:sqlite:/docs/mydb.db" Include any optimization settings, user name, password, etc.- Parameters:
stream
- stream for processingmetadata
- metadata might be useful in determining connection infoparseContext
- context to use to help create connectionString- Returns:
- connection string to be used by
getConnection(java.io.InputStream, org.apache.tika.metadata.Metadata, org.apache.tika.parser.ParseContext)
. - Throws:
IOException
-
getJDBCClassName
protected abstract String getJDBCClassName()
JDBC class name, e.g. org.sqlite.JDBC- Returns:
- jdbc class name
-
getTableNames
protected abstract List<String> getTableNames(Connection connection, Metadata metadata, ParseContext context) throws SQLException
Returns the names of the tables to process- Parameters:
connection
- Connection to use to make the sql call(s) to get the names of the tablesmetadata
- Metadata to use (potentially) in decision about which tables to extractcontext
- ParseContext to use (potentially) in decision about which tables to extract- Returns:
- Throws:
SQLException
-
getTableReader
@Deprecated protected abstract JDBCTableReader getTableReader(Connection connection, String tableName, ParseContext parseContext)
Deprecated.Given a connection and a table name, return the JDBCTableReader for this db.- Parameters:
connection
-tableName
-- Returns:
- a reader
-
getTableReader
protected abstract JDBCTableReader getTableReader(Connection connection, String tableName, EmbeddedDocumentUtil embeddedDocumentUtil)
Given a connection and a table name, return the JDBCTableReader for this db.- Parameters:
connection
-tableName
-embeddedDocumentUtil
- embedded doc util- Returns:
-
-