Uncharted Territory


RSS     Archives




This corpus contains 100 million words in more than 22,000 transcripts of ten American soap operas from 2001 and 2012. The corpus was created by Mark Davies (of Brigham Young University) and it is related to other corpora that we have created, including the 450 million word Corpus of Contemporary American English (COCA).

Even though the dialogue in the soap operas is scripted, we believe that it provides very useful insight into informal, colloquial American speech, and that it complements other similar corpora. For example, there are many informal phrases (see lists) and words (see lists) that are much more common in this corpus of soap operas than in the spoken portion of COCA and the BNC.

Conversely, there are many formal and more technical words in the spoken part of the BNC (see lists) and COCA (see lists) that are quite uncommon in this corpus of soap operas. This is because this corpus of sopa operas deals more with everyday life and personal interaction than parts of COCA Spoken and BNC Spoken.
Since this corpus is still quite new (it was released in July 2012), we welcome any comments that you might have, and especially searches that you've done that show the highly informal nature of this corpus of soap operas.

10のドラマのリストは以下の通りです。海外ドラマ『フレンズ』のジョーイが出演している事になっていたDays of Our Livesって本当に放送されているものだったのですね。今頃知りました(汗)

All My Children
As the World Turns
Bold and Beautiful
Days of Our Lives
General Hospital
Guiding Light
One Life to Live
Port Charles
Young and Restless


One might be suspicious of soap opera dialogue. After all, it is written by a scriptwriter. How well does it really represent authentic, "spoken" language? Let's take a look at this is some detail. In each case, we'll compare the soap opera scripts with the spoken portion of COCA, and the spoken portion of the BNC. We'll see that in most cases, the soap opera language in SOAP is actually much more informal than these other two corpora.

The following table shows the frequency per million words, and you can click on any of the entries to see the actual examples from the three corpora. (If you click on bars in the chart display to see Keyword in Context entries in this lower frame, you'll want to then click on the BACK button in your browser to come back to this page.) For COCA and the BNC, look at the SPOKEN column of the chart. For SOAP, look at the ALL column at the left.


また、このコーパスのいいところは、COCAとSOAPといったコーパスの結果を比較できる事でしょう。例えば、ジェニファーローレンスのインタビューでI was deaf for like six days and never went to the doctor. Because I’m a genius.とありました。I’m a geniusという表現を試しに調べてみると、SOAPの方が多く使われていることがわかります。