Speech recognition (n.): The process of identifying and interpreting or responding to the sounds produced in human speech.
鈥Oxford English Dictionary
Speech recognition is a software application best described as converting sound into words. While it鈥檚 far more detailed than that, the appeal for people whose work revolves around words is immediately apparent. Speech recognition can, depending on the circumstances, increase output dramatically鈥攔emember, you can speak much faster than you can type.
It also brings significant ergonomic advantages. For example, you aren鈥檛 chained to a desk and keyboard, and you trade the likelihood of orthopedic injuries for perhaps a sore throat. However, like all good things, speech recognition software also has its drawbacks. Some adaptation is required, the software can be finicky to set up, and while you鈥檒l probably never have a typo using it, you鈥檒l surely get 鈥渄ictos鈥濃攎isrecognized words that sound similar, but don鈥檛 have the same meaning.
One of the most difficult adaptations is learning to read out everything鈥攁nd by that I mean everything, including punctuation. For example, you would have to read the paragraph above as: 鈥淥ne of the most difficult adaptations is learning to read out everything [dash] and by that I mean everything [comma] including punctuation [period]鈥
Now that you have a clearer understanding of how speech recognition works, let鈥檚 explore how you can use a phone or tablet to unlock its potential for nothing, or nearly so.
A Bit of Background
A few years ago, I started experimenting with speech recognition at a fellow translator鈥檚 request. The limited number of languages (U.S. and British English, French, German, Italian, Spanish, Dutch, and Japanese) supported by DragonNaturally Speaking (DNS) left me using it mostly for dealing with administrative and academic writing, with no practical application to my work as a translator due to my language pairs.
I ended up experimenting with Apple鈥檚 MAC OS built-in speech recognition engine, and was extremely unimpressed with my early results鈥攖o the point of even recording my experience on video to send back to the aforementioned colleague. It turns out that I was actually making more work for myself because I had not downloaded the improved language packages. When I did, the difference was astounding, and for a long time my main workhorse was an Apple laptop running a virtual machine where my CAT tools could run while allowing me to benefit from high-quality speech recognition into Portuguese. I explored its advantages and limitations, and even circumvented the issue of adding custom vocabulary (but that鈥檚 an entirely new can of worms, and not the purpose of this piece).
Shortly after that first run, the issue of using mobile devices came along, as iOS鈥擜pple鈥檚 operating system for its mobile devices鈥攁lso provided multilingual speech recognition. Now, being a fan of Android, I also looked into that environment to see what could be done to enable using them with Windows. So, without further ado鈥
What You Can Do On Your Mobile Device
If you鈥檙e using an iPhone or iPad, you may have noticed the little microphone button right next to the space bar when you bring up the keyboard. That button activates the speech recognition engine, which will do its best to convert the sounds the microphone picks up into the words of the selected language, placing said words wherever the text cursor may be at that particular time. Now, if you鈥檙e a user of a Mac computer, you: a) don鈥檛 really need this, and b) will find it somewhat underwhelming. Mac computers allow you to download the language packs for offline use (and to customize your vocabulary). With iOS, you鈥檙e stuck with an exclusively online feature with no customizing capabilities other than choosing the speech recognition language. Also, you can鈥檛 create custom verbal commands, which can be quite useful for certain CAT tool functions.
Now, if you鈥檙e using Android, that鈥檚 an entirely different set of circumstances. Android does support some degree of speech recognition (via Google voice), but to get the really good features, you need an app from Nuance, the creators of DNS. You鈥檒l need Swype (, available on Google Play). With that downloaded and installed, you now get that coveted microphone button on your keyboard. When you press it you get a new screen with a big yellow button that you press whenever you want to use the speech recognition engine. Just like iOS鈥 speech recognition features, you must be online to use it. The advantage that Swype brings you is a fully customizable vocabulary. We鈥檒l get to that in a bit. (See Figure 1.)
Now that you鈥檝e managed to get your mobile device to recognize what you say and convert it into words, comes the really techy bit. That text output needs to go where you need it, be that the current segment or translation unit on which you are working in your CAT tool of choice, word processor, or whatever. (See the table below for language support capabilities.)
Remember, the recognized text is output to the text cursor position on your phone, just as if you had just actually typed it using the phone鈥檚 keyboard. The trick here is to make sure that happens at the computer you鈥檙e using. And for that, you鈥檒l need another app on your phone and on your computer.
Sampling of Speech听Recognition ApplicationsChrome Remote Desktop DragonNaturally Speaking MyEcho Remote Keyboard Plus Swype TeamViewer |
What To Do On Your Work Machine
Enter remote desktop and remote keyboard applications. The former type is quite well known, such as TeamViewer (www.teamviewer.com) or Chrome Remote Desktop (on Google Play and ), while the latter are applications designed to allow you to use your mobile device to input text on your computer. (See Figures 2 and 3.)

Figure 2: Enlarged view of TeamViewer on an Android phone using Swype. Note the text and mouse cursor positions, which are the actual positions on the computer.
When using remote desktop applications, the idea is not to control your computer using your mobile device. Rather, the idea is to merely use the remote application for the purpose of providing speech recognition capabilities with the output at the cursor position. (Only now, the cursor position is wherever you want it to be on your computer, because you are working on it, and the phone follows!) Choosing between TeamViewer and Chrome Remote Desktop is more a matter of personal taste than anything else. The mobile apps do have a character limit in the text buffer, but it鈥檚 quite hard to dictate such a long sentence in one go.
Remote keyboard apps are interesting in another way: they capture the 鈥渒eystrokes鈥 (i.e., the text output) from your phone and transfer it on the computer. There are quite a few apps from which to choose, although I did go with Remote Keyboard Plus (www.goonbee.com, available on iTunes for $1.99). In this instance, you open the text box in the mobile application, dictate into it, and have the content transferred to another app by the
same developer running on your computer, thus bridging both devices. (See Figure 3.)
A somewhat similar approach was used by the developers of MyEcho (www.myechoapp.com, $1.99) an application that allows you to export the text output of an iOS device鈥檚 speech recognition to a Windows machine鈥攁gain, at the cursor position. (See Figure 4.)

Figure 4: Remote Keyboard Plus running on a Windows machine. The window on the lower right corner is a live capture of the iPad screen while dictating.
For Android users, there are several apps available at the Google Play store that mimic a virtual keyboard, and due to the nature of that ecosystem, more keep coming up regularly. Any application that allows you to invoke the Swype keyboard to type on a remote machine (i.e., your computer) will work. In the end, it will be mostly a matter of personal preference, although I do tend to favor using TeamViewer or Chrome Remote Desktop, as you鈥檒l always need to piggyback two applications to obtain the desired outcome.

Figure 5: MyEcho being used to output the text-to-speech into a segment in memoQ. The window on the lower right is a live iPad screen capture of the interface.
One important factor to consider here is, as always, where do you want your data to circulate? The audio utterances will be travelling to Nuance鈥檚 servers, then the text will be sent back to your device, from which it will be forwarded elsewhere depending on your method of choice.
Is Speech Recognition Right for You?
Speech recognition does require some fiddling and experimentation on the user鈥檚 end, as each application has its own mix of software packages and, particularly with Android devices, sometimes very different specifications. There is no one-formula-fits-all solution, but most applications are easy enough to figure out, and you have a good shot at improving your workflow.
For additional information, you can look up the blog section on my website at for over an hour of videos on setting up these applications on several combinations of platforms. You鈥檒l also find a few helpful tips regarding custom vocabulary. 飩
Remember, if you have any ideas and/or suggestions regarding helpful resources or tools you would like to see featured, please e-mail Jost Zetzsche at jzetzsche@internationalwriters.com.听
Tiago Neto focuses on linguistic and regulatory compliance for companies working in the fields of veterinary medicine, pharmaceutical products, and medical devices and vitro diagnostic devices. He has worked in clinical activities, public health and epidemiology, dealing with both humans and animals. He has a degree in veterinary medicine and is currently a PhD candidate in biomedical sciences, working on novel therapeutic approaches in an animal model for cervical cancer in humans. He is also a freelance translator and consultant in these fields. Contact: tiago@tiagoneto.com.


