Speech Recognition Is in Your Back Pocket (or Wherever You Keep Your Mobile Phone)

Source: The ATA Chronicle

Speech recognition (n.): The process of identifying and interpreting or responding to the sounds produced in human speech.
—Oxford English Dictionary

Speech recognition is a software application best described as converting sound into words. While it’s far more detailed than that, the appeal for people whose work revolves around words is immediately apparent. Speech recognition can, depending on the circumstances, increase output dramatically—remember, you can speak much faster than you can type.

It also brings significant ergonomic advantages. For example, you aren’t chained to a desk and keyboard, and you trade the likelihood of orthopedic injuries for perhaps a sore throat. However, like all good things, speech recognition software also has its drawbacks. Some adaptation is required, the software can be finicky to set up, and while you’ll probably never have a typo using it, you’ll surely get “dictos”—misrecognized words that sound similar, but don’t have the same meaning.

One of the most difficult adaptations is learning to read out everything—and by that I mean everything, including punctuation. For example, you would have to read the paragraph above as: “One of the most difficult adaptations is learning to read out everything [dash] and by that I mean everything [comma] including punctuation [period]”

Now that you have a clearer understanding of how speech recognition works, let’s explore how you can use a phone or tablet to unlock its potential for nothing, or nearly so.

A Bit of Background

A few years ago, I started experimenting with speech recognition at a fellow translator’s request. The limited number of languages (U.S. and British English, French, German, Italian, Spanish, Dutch, and Japanese) supported by DragonNaturally Speaking (DNS) left me using it mostly for dealing with administrative and academic writing, with no practical application to my work as a translator due to my language pairs.

I ended up experimenting with Apple’s MAC OS built-in speech recognition engine, and was extremely unimpressed with my early results—to the point of even recording my experience on video to send back to the aforementioned colleague. It turns out that I was actually making more work for myself because I had not downloaded the improved language packages. When I did, the difference was astounding, and for a long time my main workhorse was an Apple laptop running a virtual machine where my CAT tools could run while allowing me to benefit from high-quality speech recognition into Portuguese. I explored its advantages and limitations, and even circumvented the issue of adding custom vocabulary (but that’s an entirely new can of worms, and not the purpose of this piece).

Shortly after that first run, the issue of using mobile devices came along, as iOS—Apple’s operating system for its mobile devices—also provided multilingual speech recognition. Now, being a fan of Android, I also looked into that environment to see what could be done to enable using them with Windows. So, without further ado…

What You Can Do On Your Mobile Device

If you’re using an iPhone or iPad, you may have noticed the little microphone button right next to the space bar when you bring up the keyboard. That button activates the speech recognition engine, which will do its best to convert the sounds the microphone picks up into the words of the selected language, placing said words wherever the text cursor may be at that particular time. Now, if you’re a user of a Mac computer, you: a) don’t really need this, and b) will find it somewhat underwhelming. Mac computers allow you to download the language packs for offline use (and to customize your vocabulary). With iOS, you’re stuck with an exclusively online feature with no customizing capabilities other than choosing the speech recognition language. Also, you can’t create custom verbal commands, which can be quite useful for certain CAT tool functions.

Now, if you’re using Android, that’s an entirely different set of circumstances. Android does support some degree of speech recognition (via Google voice), but to get the really good features, you need an app from Nuance, the creators of DNS. You’ll need Swype (, available on Google Play). With that downloaded and installed, you now get that coveted microphone button on your keyboard. When you press it you get a new screen with a big yellow button that you press whenever you want to use the speech recognition engine. Just like iOS’ speech recognition features, you must be online to use it. The advantage that Swype brings you is a fully customizable vocabulary. We’ll get to that in a bit. (See Figure 1.)

Figure 1: Swype on Google Play Store

Now that you’ve managed to get your mobile device to recognize what you say and convert it into words, comes the really techy bit. That text output needs to go where you need it, be that the current segment or translation unit on which you are working in your CAT tool of choice, word processor, or whatever. (See the table below for language support capabilities.)

Remember, the recognized text is output to the text cursor position on your phone, just as if you had just actually typed it using the phone’s keyboard. The trick here is to make sure that happens at the computer you’re using. And for that, you’ll need another app on your phone and on your computer.

Sampling of Speech��Recognition Applications

Chrome Remote Desktop

DragonNaturally Speaking

MyEcho

Remote Keyboard Plus

Swype

TeamViewer

What To Do On Your Work Machine

Enter remote desktop and remote keyboard applications. The former type is quite well known, such as TeamViewer (www.teamviewer.com) or Chrome Remote Desktop (on Google Play and ), while the latter are applications designed to allow you to use your mobile device to input text on your computer. (See Figures 2 and 3.)

Figure 2: Enlarged view of TeamViewer on an Android phone using Swype. Note the text and mouse cursor positions, which are the actual positions on the computer.

Figure 3: Here’s what the output of that dictation from Figure 1 looks like on the computer screen.

When using remote desktop applications, the idea is not to control your computer using your mobile device. Rather, the idea is to merely use the remote application for the purpose of providing speech recognition capabilities with the output at the cursor position. (Only now, the cursor position is wherever you want it to be on your computer, because you are working on it, and the phone follows!) Choosing between TeamViewer and Chrome Remote Desktop is more a matter of personal taste than anything else. The mobile apps do have a character limit in the text buffer, but it’s quite hard to dictate such a long sentence in one go.

Remote keyboard apps are interesting in another way: they capture the “keystrokes” (i.e., the text output) from your phone and transfer it on the computer. There are quite a few apps from which to choose, although I did go with Remote Keyboard Plus (www.goonbee.com, available on iTunes for $1.99). In this instance, you open the text box in the mobile application, dictate into it, and have the content transferred to another app by the
same developer running on your computer, thus bridging both devices. (See Figure 3.)

A somewhat similar approach was used by the developers of MyEcho (www.myechoapp.com, $1.99) an application that allows you to export the text output of an iOS device’s speech recognition to a Windows machine—again, at the cursor position. (See Figure 4.)

Figure 4: Remote Keyboard Plus running on a Windows machine. The window on the lower right corner is a live capture of the iPad screen while dictating.

For Android users, there are several apps available at the Google Play store that mimic a virtual keyboard, and due to the nature of that ecosystem, more keep coming up regularly. Any application that allows you to invoke the Swype keyboard to type on a remote machine (i.e., your computer) will work. In the end, it will be mostly a matter of personal preference, although I do tend to favor using TeamViewer or Chrome Remote Desktop, as you’ll always need to piggyback two applications to obtain the desired outcome.

Figure 5: MyEcho being used to output the text-to-speech into a segment in memoQ. The window on the lower right is a live iPad screen capture of the interface.

One important factor to consider here is, as always, where do you want your data to circulate? The audio utterances will be travelling to Nuance’s servers, then the text will be sent back to your device, from which it will be forwarded elsewhere depending on your method of choice.

Is Speech Recognition Right for You?

Speech recognition does require some fiddling and experimentation on the user’s end, as each application has its own mix of software packages and, particularly with Android devices, sometimes very different specifications. There is no one-formula-fits-all solution, but most applications are easy enough to figure out, and you have a good shot at improving your workflow.

For additional information, you can look up the blog section on my website at for over an hour of videos on setting up these applications on several combinations of platforms. You’ll also find a few helpful tips regarding custom vocabulary. 

Remember, if you have any ideas and/or suggestions regarding helpful resources or tools you would like to see featured, please e-mail Jost Zetzsche at jzetzsche@internationalwriters.com.��

Tiago Neto focuses on linguistic and regulatory compliance for companies working in the fields of veterinary medicine, pharmaceutical products, and medical devices and vitro diagnostic devices. He has worked in clinical activities, public health and epidemiology, dealing with both humans and animals. He has a degree in veterinary medicine and is currently a PhD candidate in biomedical sciences, working on novel therapeutic approaches in an animal model for cervical cancer in humans. He is also a freelance translator and consultant in these fields. Contact: tiago@tiagoneto.com.

��������