Class TranscribeTranslateExample

java.lang.Object
org.apache.tika.example.TranscribeTranslateExample

public class TranscribeTranslateExample extends Object
This example demonstrates primitive logic for chaining Tika API calls. In this case translation could be considered as a downstream process to transcription. We simply pass the output of a call to Tika.parseToString(Path) into Translator.translate(String, String). The GoogleTranslator is configured with a target language of "en-US".
Author:
lewismc
  • Constructor Details

    • TranscribeTranslateExample

      public TranscribeTranslateExample()
  • Method Details

    • googleTranslateToEnglish

      public static String googleTranslateToEnglish(String text)
      Use GoogleTranslator to execute translation on input data. This implementation needs configured as explained in the Javadoc. In this implementation, Google will try to guess the input language. The target language is "en-US".
      Parameters:
      text - input text to translate.
      Returns:
      translated text String.
    • amazonTranscribe

      public static String amazonTranscribe(Path tikaConfig, Path file) throws Exception
      Use AmazonTranscribe to execute transcription on input data. This implementation needs to be configured as explained in the Javadoc.
      Parameters:
      file - the name of the file (which needs to be on the Java Classpath) to transcribe.
      Returns:
      transcribed text.
      Throws:
      Exception
    • main

      public static void main(String[] args) throws Exception
      Main method to run this example. This program can be invoked as follows
      1. transcribe-translate ${tika-config.xml} ${file}; which executes both transcription then translation on the given resource, or
      2. transcribe ${tika-config.xml} ${file}; which executes only translation
      Parameters:
      args - either of the commands described above and the input file (which needs to be on the Java Classpath).

      ${tika-config.xml} must include credentials for aws and a temporary storage bucket:

                   
                    <properties>
                     <parsers>
                       <parser class="org.apache.tika.parser.DefaultParser"/>
                       <parser class="org.apache.tika.parser.transcribe.aws.AmazonTranscribe">
                         <params>
                           <param name="bucket" type="string">bucket</param>
                           <param name="clientId" type="string">clientId</param>
                           <param name="clientSecret" type="string">clientSecret</param>
                         </params>
                       </parser>
                     </parsers>
                   </properties>
                   
                   
      Throws:
      Exception