Readings, Tools, and Useful Links for Corpus Analysis
This post originally appeared on the blog and it is republished with permission.
The following list is a result of collaboration by participants of Lancaster’s recent MOOC on Corpus Linguistics. This is a selection of the links that I considered more relevant for those who might want to start exploring this field. If you want to share other links, feel free to add a comment or send me a message and I will add it here. I will keep you posted on the next CL course by Lancaster University. This post complements previous posts on lists, , and .
Readings
Ìý– G. Bennet
Ìý– D. Krieger
. Abstract book – F. Formato and A. Hardie (Lancaster:UCREL)
– R. Garside, G. Leech, T. McEnery
– L. Anthony
? – C. Gabrielatos
Books
– P. Baker
Google book: – S. Laviosa
Google book: : Research and Applications – A. Kruger, K. Wallmach, J. Munday
Tools
is a free online, stripped down version of the Sketch Engine corpus query software. It allows very simple searches for words which will produce a word sketch to show the grammatical and collocational behavior of the word. It also produces a list of similar words and the regular concordance lines. One of our tutors in Lancaster’s MOOC, Keith Barrs, (from page 6).
. Concordance the web in real-time
is a software tool for corpus analysis and comparison.
Corpora
is a collection of essays by students at SILS, the School of International Liberal Studies at Waseda University.
(TEC) is a corpus of contemporary translational English: it consists of written texts translated into English from a variety of source languages, European and non-European
is an analytical database of English with over 4.5 billion words. It contains written material from websites, newspapers, magazines and books published around the world, and spoken material from radio, TV and everyday conversations.
. The Open Parallel Corpus is a growing collection of translated texts from the web.
. is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces toÌýÌýsuch as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries, and an activeÌý.
ÌýOnline is an online corpus service offering you the chance to tap into the unique resources of the Collins Word Web, on which the highly successful range of Collins dictionaries is based.
is a digitized collection of project work produced by children aged between 9 and 11.
For corpora in other languages visit of Humbolt-Universität zu Berlin, and lemmatization lists in several languages at .com
Other useful and interesting links
.
by Corpora4Learning
. This is published once a year (in the spring) with articles, conference reports, reviews and notices related to corpus linguistics. Each issue is about 150 pages and there have been 36 issues published.
The by the Economic & Social Research Council
by Mura Nava using TimeMapper
Articles in Spanish
– G. Parodi
– Guillermo Rojo
– M.ÌýAlcántara Plá
– J. R. Firth, a I.A. Mel’cuk, by M. Alonso Ramos
– G. Rojo
– G. Corpas
– E. Alonso
For corpora in Spanish, visit my page (Corpora EN+ES section)
Author bio
Patricia BrenesÌýis the owner of the blogÌý.ÌýOriginally from Costa Rica, she moved to Washington in 2000 to work for an international organization. She obtained her Master’s Degree in Specialized Translation at the Universitat de Vic in Barcelona and is a Certified Terminology Manager (ECQA-TermNet). Her blog collects useful information on theory and practice, as well as infographics, biographies, interviews, tools, and much more.