Class SpoolingStrategy

java.lang.Object
org.apache.tika.io.SpoolingStrategy

public class SpoolingStrategy extends Object
Strategy for determining when to spool a TikaInputStream to disk.

Components (detectors, parsers) can check this strategy before calling TikaInputStream.getFile() to determine if spooling is appropriate for the given media type.

Default behavior (when no strategy is in ParseContext): components spool when needed. A strategy allows fine-grained control over spooling decisions.

Configure via JSON:

 {
   "spooling-strategy": {
     "spoolTypes": ["application/zip", "application/x-tika-msoffice", "application/pdf"]
   }
 }
 
  • Constructor Details

    • SpoolingStrategy

      public SpoolingStrategy()
  • Method Details

    • shouldSpool

      public boolean shouldSpool(TikaInputStream tis, Metadata metadata, MediaType mediaType)
      Determines whether the stream should be spooled to disk.
      Parameters:
      tis - the TikaInputStream (can check hasFile(), getLength())
      metadata - metadata (can check content-type hints, filename)
      mediaType - the detected or declared media type
      Returns:
      true if the stream should be spooled to disk
    • setSpoolTypes

      public void setSpoolTypes(Set<MediaType> spoolTypes)
      Sets the media types that should be spooled to disk. Specializations of these types are also included.
      Parameters:
      spoolTypes - set of media types to spool
    • getSpoolTypes

      public Set<MediaType> getSpoolTypes()
      Returns the media types that should be spooled to disk.
      Returns:
      set of media types to spool
    • setMediaTypeRegistry

      public void setMediaTypeRegistry(MediaTypeRegistry registry)
      Sets the media type registry used for checking type specializations.
      Parameters:
      registry - the media type registry
    • getMediaTypeRegistry

      public MediaTypeRegistry getMediaTypeRegistry()
      Returns the media type registry.
      Returns:
      the media type registry, or null if not set