Saturday, November 2, 2013

Transcription Controller in an Afternoon

I have some audio files that I need to transcribe. I figured it would be easy just to load them up in the Audacity audio editor and type away. Not so easy. People talk much faster than I can type, and it's hard to control Audacity while trying to type on the word processor. Fortunately, Audacity has keyboard short-cuts. I just need a way to connect a foot pedal to Audacity.

An old PS2 mouse makes a decent foot pedal. I gutted the unit, removing the scroll wheel, and then wired the mouse buttons to the I/O cable.


I then cut off the PS2 connector, and wired it to pin 2 of an Arduino. I also added a 10K pull-up resistor. Here's what it looks like assembled and connected.


 The Arduino was programmed to send the following text strings:

15 seconds after boot: "g"
mouse down: "0"
mouse up: "1"

Here's the code (adapted from Arduino Playground):

// digital pin 2 has a pushbutton attached to it.
int pushButton = 2;
// the setup routine runs once when you press reset:
void setup() {
  // initialize serial communication at 9600 bits per second:
  Serial.begin(9600);
  // make the pushbutton's pin an input:
  pinMode(pushButton, INPUT);
  delay(15000);
}
void loop() {
  Serial.println("g");
  int initButtonState=digitalRead(pushButton);
  //loop forever
  while(1)
  {
    // read the input pin:
    int buttonState = digitalRead(pushButton);
    if(initButtonState != buttonState)
      {
        // print out the state of the button:
        Serial.println(buttonState);
        //debounce
        delay(5);
      }
     initButtonState=buttonState;
   }
 }



I decided to code this in Python because it's a pretty fun and easy language with lots of libraries. But, the first thing I needed was X-windows automation and there seem to be a lot of choices. Even though it's been replaced by Xaut, I found Xautomation worked for me. I got Python and Xautomation from the Linux Mint Software Library, but I could have got them as easily using apt-get.

For each of the received  characters I used Xautomation to sent the following key strokes.


g = space p (start playback and pause)
0 = p (un-pause)
1 = comma comma comma comma comma p (back up a little, then pause)


 The last link was the serial link connecting the Arduino to the Python code. pySerial looked like a good library, and to get it I would need python-pip.

sudo apt-get install python-pippip pySerial

pySerial didn't work at first. I found I had to execute the following commands.

sudo usermod -a -G dialout tester
sudo chmod 777 /dev/ttyACM0

The first command gives you permission to access serial I/O. The second gives you permission to use the particular USB device. Unless you have put these settings in a bash script, you'll have to execute them every time you run the program. Also, depending on your hardware, your USB device may have a different name (like ttyUSB0). The short-cut way to getting pySerial working would be to run Python as root, which is a very bad idea, however.

Here's the code with all in all its ugliness:

# serial_read_keys.py
import time
import serial
from subprocess import Popen, PIPE

control_f4_sequence = '''keydown Control_L
key F4
keyup Control_L
'''

shift_a_sequence = '''keydown Shift_L
key A
keyup Shift_L
'''


initialize_sequence = '''key space
key P
'''


play_sequence = '''key space
'''

unpause_sequence = '''key P
'''

pause_sequence = '''key P
'''

backup_sequence = '''key comma
'''

def keypress(sequence):
    p = Popen(['xte'], stdin=PIPE)
    p.communicate(input=sequence)

ser = serial.Serial('/dev/ttyACM0',9600)

while (1) :
        #print 'reading line'
        rcvChar = ser.readline()
        # print rcvChar
        if 'g' in rcvChar :
            print 'initialize - play and pause'
            keypress(play_sequence)
            time.sleep(0.1)
            keypress(pause_sequence)
        if '0' in rcvChar :
            print 'unpause'
            keypress(unpause_sequence)
        if '1' in rcvChar :
            print 'backup a little then pause'
            keypress(backup_sequence)
            time.sleep(0.1)
            keypress(backup_sequence)
            time.sleep(0.1)
            keypress(backup_sequence)
            time.sleep(0.1)
            keypress(backup_sequence)
            time.sleep(0.1)
            keypress(backup_sequence)
            time.sleep(0.1)
            keypress(pause_sequence)


I had to do some experimentation, and I left all of that in there so I could document what I had learned.

To do transcription, first open your audio file with Audacity. You may want to use the Effect, Change Tempo menu item to slow down the play-back. Now start the Python script. You have 15 seconds to do the following: make sure the Audacity stop button is clicked, then click on the waveform you want to transcribe.

After 15 seconds, the script will click the play button then immediately click pause. Don't touch anything on your screen again. If you do, it will lose focus and the key-presses won't go to Audacity. So, how are you supposed to type the transcription then? Use another computer! I neglected to tell you that, didn't I?

Go to the other computer, mash down on the mouse with you foot and the audio will begin to play. Release the mouse and the audio will back up about 5 seconds and then pause. Why does it back up before pausing? So you can more easily sync up your typing. If you want to back up more double click the mouse.

One unexpected nice feature I found is that when you start the script, it reboots the Arduino, so you don't have to reach down and press the reset button.

No comments:

Post a Comment