dictionaryOK, so this probably isn’t a question that comes to mind a lot, if ever. A dictionary is there to look up a spelling or check a definition – where the spelling or definition comes from, and who wrote it, is not usually a cause of concern.

However, I’ve been reading an article that mentioned the Oxford Corpus, a language project at Oxford University in England that provides the basis for the Oxford English Dictionary (OED), one of the most influential and authoritative texts on the English language. But what is the Corpus?

A corpus is a collection of texts of written (or spoken) language presented in electronic form. It provides the evidence of how language is used in real situations, from which lexicographers can write accurate and meaningful dictionary entries. The Oxford English Corpus is at the heart of dictionary-making in Oxford in the 21st century and ensures that we can track and record the very latest developments in language today. By analysing the corpus and using special software, we can see words in context and find out how new words and senses are emerging, as well as spotting other trends in usage, spelling, world English, and so on.

The corpus currently contains over 2 billion words (as of Spring 2006), and draws them from all over the English speaking world, not just the UK. Two billion seems like a huge amount, but they note that the count is not 2 billion different words – the word ‘the’ on its own makes up about 100 million entries.

From these two billion words the OED is compiled, with the last comprehensive dictionary published in 1989. This second edition ran to twenty volumes with supplements printed in the 1990’s. Since 1990, the dictionary researchers have been working on reviewing the whole dictionary rather than just adding to it – no word yet on when the final version will be published.

Feeling a little overwhelmed by all those words? Well, the OED compact contains a mere 145,000 words, phrases and definitions.

