Dragos Ciobanu, from the translation department of the University of Leeds, wrote to me a few weeks ago:
The reason I鈥檓 writing is to ask whether you鈥檝e had a chance to play with Sketch Engine (see sketchengine.co.uk). In Leeds, we鈥檝e been using it in our corpus linguistics work a lot. It鈥檚 got brilliant features, from the terabytes of super useful multilingual data which it already comes with, to features for term extraction, specialized corpus building, thesaurus, collocations, and tons more! It鈥檚 really, really cool and I鈥檓 only writing to you because the translators I know who have been playing with it also like it a lot.
Not sure whether you could tell, but Dragos really likes Sketch Engine. And in a way, I could stop this column right here, because he already said it all鈥攕ort of.
After spending some time looking at Sketch Engine, I felt embarrassed that I hadn鈥檛 known more about it. As Drago said, it鈥檚 really, really cool. It鈥檚 also a monster of a tool (size-wise) and it鈥檚 not particularly easy to navigate when you first encounter it. (According to Ondrej Matu拧ka of the Sketch Engine team, one of the areas they鈥檙e trying to focus on in the immediate future is to make the product more user-friendly.)
But first, what exactly is Sketch Engine and what does it do?
It鈥檚 a corpus tool developed by the Czech company Lexical Computing Limited. Lexical Computing was originally founded in 2003 by the late Brit Adam Kilgarriff and Pavel Rychl媒, a professor at Masaryk University in Brno. The idea of corpus tools, and this corpus tool in particular, is to find how language behaves based on large collections of data. For this purpose, Sketch Engine built corpora in more than 80 languages (as well as 鈥渢ime-stamped鈥 corpora in a slightly different set of 18 languages for the purpose of comparing word usage over time). The sizes of the corpora differ widely (from just a few million words in Maori to more than 800 billion in English), and they are available for a number of analysis purposes for any paying trial user. (The annual subscription price is 100 euros for non-academic users, with the trial period ending after 30 days.)
The analyses you can do on these corpora with Sketch Engine include the following:
Word Sketches: This is where the program got its name, and it鈥檚 what Kilgarriff brought to the table. A word sketch is a summary of a word鈥檚 grammatical and collocational behavior (collocational refers to the analysis of how often a word co-occurs with other words or phrases. (See Figure 1 below.) Since the data in the corpora is lemmatized (i.e., words are analyzed so they can be brought back to their base or dictionary form), the results are a lot more meaningful than what most of our translation environment tools provide when they鈥檙e unable to relate different forms of one word to each other. Another word sketch option that Sketch Engine offers is the comparison of word sketches of similar words.
Thesaurus: The ability to retrieve a detailed list or a graphical word cloud with similar words, including links to create reports on word sketch differences for those terms to understand the exact differences in actual usage.
Concordance: Searches for single words, terms, or even longer phrases. Since the data in the supported languages is tagged, it鈥檚 also possible to search for specific classes of words or specific classes of words that surround the word in question.
Parallel Corpus: Retrieval of bilingual or multilingual sets of words or phrases within the contexts. (See Figure 2 below.) Presently this is available only for on-screen data viewing, but it will soon be offered as downloadable data. This is especially helpful when uploading your own translation memories (see below).
Word Lists: The possibility of creating lists of words and the number of occurrences, either as lemmas (the base form of each word) or in each word form.
Creating Your Own Corpus: This is likely the most exciting feature for translators. You can either upload your own translation memories or use the tool鈥檚 own search engine mechanism (which relies on Microsoft Bing) to create a list of bilingual websites that contain the terms that are relevant to your field, have them automatically align, and form a corpus. I don鈥檛 need to explain to you the possibilities this offers to translators who don鈥檛 have the privilege of having high-quality translation memories or termbases for a particular subject matter that they need to translate. As a logical extension of this feature, not only can you perform any of the functions mentioned earlier, but it鈥檚 also possible to run a keyword search on the user-created corpus, identify the terms that are relevant, and download that into an Excel or TBX file. This feature is currently available for Chinese, Czech, Dutch, English, French, German, Italian, Japanese, Korean, Polish, Portuguese, Russian, and Spanish. The bilingual version of this is just around the corner.
By the way, you can find an example of the up-and-coming increased user-friendliness of Sketch Engine in OneClick Terms (terms.sketchengine.co.uk) that allows you to extract terms from TMX, XLIFF, PDF, DOC, DOCX, HTML, or TXT files in essentially one or two clicks. (See Figure 3 below.)
Translators have been one of the primary target groups for the makers of Sketch Engine. One immediate result of that focus is the availability of a plug-in for SDL Trados Studio (see and ). The plug-in itself is free, but it requires a trial or paid registration to be usable. It allows you to perform collocation, thesaurus, and concordance searches and will soon offer term extraction. According to Ondej Matuska at Sketch Engine, talks with makers of other translation environment tools are under way to offer plug-ins or add-ons for those tools as well.
Can you believe you鈥檝e never heard about this tool before? Well, maybe you were quicker than I to find this, but the good thing is that now we all know.
Jost Zetzsche is chair of ATA鈥檚 Translation and Interpreting Resources Committee. He writes the 鈥淕eekspeak鈥 column for The ATA Chronicle. He is also the co-author of Found in Translation: How Language Shapes Our Lives and Transforms the World, a robust source for replenishing your arsenal of information about how human translation and machine translation each play an important part in the broader world of translation. Contact: jzetzsche@internationalwriters.com.
Remember, if you have any ideas and/or suggestions regarding helpful resources or tools you would like to see featured, please e-mail me at jzetzsche@internationalwriters.com.


