Class JDBCPipesReporter

All Implemented Interfaces:
Closeable, AutoCloseable, Initializable

public class JDBCPipesReporter extends PipesReporterBase implements Initializable
This is an initial draft of a JDBCPipesReporter. This will drop the tika_status table with each run. If you'd like different behavior, please open a ticket on our JIRA!
  • Field Details

  • Constructor Details

    • JDBCPipesReporter

      public JDBCPipesReporter()
  • Method Details

    • initialize

      public void initialize(Map<String,Param> params) throws TikaConfigException
      Specified by:
      initialize in interface Initializable
      Overrides:
      initialize in class PipesReporterBase
      Parameters:
      params - params to use for initialization
      Throws:
      TikaConfigException
    • checkInitialization

      public void checkInitialization(InitializableProblemHandler problemHandler) throws TikaConfigException
      Specified by:
      checkInitialization in interface Initializable
      Overrides:
      checkInitialization in class PipesReporterBase
      Parameters:
      problemHandler - if there is a problem and no custom initializableProblemHandler has been configured via Initializable parameters, this is called to respond.
      Throws:
      TikaConfigException
    • setConnection

      @Field public void setConnection(String connection)
    • setCacheSize

      @Field public void setCacheSize(int cacheSize)
      Commit the reports if the cache is greater than or equal to this size.

      Default is DEFAULT_CACHE_SIZE.

      The reports will be committed if the cache size triggers reporting or if the amount of time since last reported (reportWithinMs) triggers reporting.

      Parameters:
      cacheSize -
    • setCreateTable

      @Field public void setCreateTable(boolean createTable)
      The default is true. In a distributed setting with multiple servers, this should be set to false, and you'll need to set up the table on your own.

      NOTE The default behavior is to drop the table if it exists and then create it. Make sure to set this to false if you do not want to drop the table.

      Parameters:
      createTable -
    • setTableName

      @Field public void setTableName(String tableName)
      The default is TABLE_NAME
      Parameters:
      tableName -
    • setReportSql

      @Field public void setReportSql(String reportSql)
      This is the sql for the prepared statement to execute to store the report record. the default is: insert into tika_status (id, status, timestamp) values (?,?,?) This can be modified for specific dialects of SQL or to run an upsert, merge or update instead of the default insert. Users need to coordinate this with setReportVariables(List)
      Parameters:
      reportSql -
    • getTableName

      public String getTableName()
    • getReportVariables

      public List<String> getReportVariables()
    • getReportSql

      public String getReportSql()
    • isCreateTable

      public boolean isCreateTable()
    • setReportVariables

      @Field public void setReportVariables(List<String> variables)
      ADVANCED: This is used to set the variables in the prepared statement for the report. This needs to be coordinated with setReportSql(String). The available variables are "id, status, timestamp". If you're modifying to an update statement like "update table tika_status set status=?, timestamp=? where id = ?" then the values for this would be ["status", "timestamp", "id"].

      The default for the insert is ["id", "status", "timestamp"]

      Parameters:
      variables -
    • setReportWithinMs

      @Field public void setReportWithinMs(long reportWithinMs)
      Commit the reports if the amount of time elapsed since the last report commit exceeds this value.

      Default is DEFAULT_REPORT_WITHIN_MS.

      The reports will be committed if the cache size triggers reporting or if the amount of time since last reported triggers reporting.

      Parameters:
      reportWithinMs -
    • setPostConnection

      @Field public void setPostConnection(String postConnection)
      This sql will be called immediately after the connection is made. This was initially added for setting pragmas on sqlite3, but may be used for other connection configuration in other dbs. Note: This is called before the table is created if it needs to be created.
      Parameters:
      postConnection -
    • report

      public void report(FetchEmitTuple t, PipesResult result, long elapsed)
      Specified by:
      report in class PipesReporter
    • error

      public void error(Throwable t)
      Description copied from class: PipesReporter
      This is called if the process has crashed. Implementers should not rely on close() to be called after this.
      Specified by:
      error in class PipesReporter
    • error

      public void error(String msg)
      Description copied from class: PipesReporter
      This is called if the process has crashed. Implementers should not rely on close() to be called after this.
      Specified by:
      error in class PipesReporter
    • close

      public void close() throws IOException
      Description copied from class: PipesReporter
      No-op implementation. Override for custom behavior
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Overrides:
      close in class PipesReporter
      Throws:
      IOException