fc2ブログ

Uncharted Territory

自分が読んで興味深く感じた英文記事を中心に取り上げる予定です

RSS     Archives
 

テキスト分析の威力

 


Sunday Timesによる調査報道で、The Cuckoo’s Callingがハリポーッターの著者が偽名で出版した事があきらかになりました。或る出版社が出版を拒否したことや、彼女の作品とは知らずにデビュー作として絶賛していることが記事になっていますね。

The Cuckoo's CallingThe Cuckoo's Calling
(2014/02/13)
Robert Galbraith

商品詳細を見る


残念ながらSunday Timesの記事は定期購読者ではないと閲覧できないので、どういう調査をしたのか分からないのですが、テキスト分析をして、過去のRowlingの著作と新作と他の探偵小説とを見比べて表現の類似性が他の探偵ものの小説よりも彼女の過去の著作とが大きいと結論づけたことです。ツイッターやブログでもその人の個性って出やすいですが、小説でも同じようなものなのですね。

'I turned down 'Robert Galbraith'': Editor admits passing on novel that turned out to be by JK Rowling
Celebrated crime writer Val McDermid, who wrote a positive “blurb” for the cover of The Cuckoo’s Calling unaware it was a Rowling work
NICK CLARK
SUNDAY 14 JULY 2013
Literary signature: Text analysis
One of those behind the unmasking of Robert Galbraith as J K Rowling was Peter Millican, a computer linguistic expert who has developed software to analyse and compare texts.

The Gilbert Ryle fellow in philosophy at Hertford College, Oxford, was brought in to analyse the text to discover whether there were tell-tale signs of Rowling’s penmanship in The Cuckoo’s Calling. His Signature stylometric system ran nine texts in just a matter of hours and discovered that the comparison between the crime thriller and two of the author’s other texts was “striking”. Professor Millican analysed the book against J K Rowling novels The Casual Vacancy and Harry Potter and the Deathly Hallows.

The analysis included comparison with two works each from crime authors Ruth Rendell, P D James and Val McDermid. “Nine texts is not a huge amount but it was the least I needed for the test to be robust,” he said.

JK Rowling revealed as detective novel author
By RORY REYNOLDS 
Published on 15/07/2013 00:19

Two independent computer linguistic experts, Peter Millican from Oxford University and Patrick Juola from Duquesne University in Pittsburgh, ran the last Harry Potter novel and The Casual Vacancy, plus The Cuckoo’s Calling, along with two other detective books, through their specialist programmes.
“It was striking that The Cuckoo’s Calling came out significantly closer to A Casual Vacancy and even Harry Potter and the Deathly Hallows than the other books,” Mr Millican said.

Peter Millican教授のウエブサイトにいくとその分析ソフトが公開されていました。フリーソフトとして我々も使わせてもらえるようです。

The Signature Stylometric System
A User-Friendly System for Textual Analysis
Welcome to the home page of Signature, a program designed to facilitate "stylometric" analysis and comparison of texts, with a particular emphasis on author identification. The collage below on the right illustrates the sorts of task for which Signature can be used: comparing the styles of Jane Austen and other novelists; examining the "authorial signature" of the plays written by (or controversially attributed to) Shakespeare; establishing the provenance of ancient manuscripts such as the shared books of Aristotle's Ethics; identifying the author of the unattributed Federalist Papers; and investigating the relationships between Biblical scriptures (e.g. Did "Luke" write Acts? Did Paul write Hebrews?).

Register Your Interest in Signature 2.00
At present (Summer 2013), Signature has been undergoing the most important enhancement since its initial development, which is now very close to completion (testing is in hand, and documentation is 95% completed). Version 2.00 will include a wide range of new facilities, including:
• More powerful file-handling and filtering tools
• Ability to specify relevant alphabets and punctuation etc. for different languages/genres
• Wordlist facilities extended to accommodate phrases of specified length(s)
• Similar facilities for bigrams/trigrams etc.
• Choice of keyness measures for key word/phrase identification
• Fully automatic creation of frequent word/phrase lists
• Automated monitoring of previously specified words
• Powerful concordancer, enabling also punctuation and proximity searches etc.
• Principal Component Analysis, applicable to all data types
• Burrows' Delta analysis, applicable to all data types
• Multiple chi-square analysis, applicable to all data types
• Main parameters of all facilities easily configurable
• Comprehensive help and theoretical documentation
Investigation is also under way to test the feasibility of incorporating grammatical analysis into the concordancer, so as to enable grammar-informed searching etc. If this proves feasible, the concordancer will also be further integrated with the graphing and data analysis facilities.
It may be some time before Signature 2.00 is fully tested and published here. In the meantime, if you are interested in acquiring it, please register your interest, so that you can be kept informed of progress and provided with the software at the first available opportunity. You might also be invited (on a purely optional basis, of course) to beta-test the software, assistance with which would be much appreciated.

まあ、ダウンロードしたとしても使いこなせそうになさそうですが。。。TOEICの公式問題集と、それ以外の問題集を見比べて類似性を出せたりしたら面白そうですね。
スポンサーサイト



Comment


    
プロフィール

Yuta

Author:Yuta
FC2ブログへようこそ!




最新トラックバック



FC2カウンター

検索フォーム



ブロとも申請フォーム

QRコード
QR