Skip to main content

Web Content Display Web Content Display

News

Breadcrumb Breadcrumb

Web Content Display Web Content Display

What is stylometry?

What is stylometry?

In the past, it was almost impossible to identify an anonymous author of a book. However, as we delve deeper into the branch of knowledge known as stylometry, we become capable of distinguishing writers by analysing certain traits of their works, such as frequency and patterns of words they use. We asked Jagiellonian University researchers, Dr Michał Choiński and Dr Jan Rybicki from the Institute of English Studies, to tell us more about this ‘mathematical intrusion’ into the world of language.

1Question is a series of articles by the University Marketing science communication unit, in which specialists and experts from various fields briefly discuss interesting issues related to the world, civilisation, culture, biology, history, and many more.

The foundations of stylometry were laid over one hundred years ago by famous Polish philosopher Wincenty Lutosławski, who employed literary analysis in order to establish the chronology of Plato’s writings. These techniques are the main tools used by Dr Choiński and Dr Rybicki. They provided them with a framework in their recently completed grant The Language of Eighteenth-Century American Colonial Sermons. A Rhetorical and Stylometric Analysis.

It's worth to keep in mind that stylometry, as a highly reliable analytical tool for language research, is constantly being developed and improved around the world. The Jagiellonian University is an important stylometric centre as a member of the Digital Research Infrastructure for the Arts and Humanities (DARIAH-PL), the largest Polish consortium dedicated to humanities.

In search of a lost author

Stylometry is a method which allows to determine a text’s author or chronology based on a deep analysis performed with the aid of computers. It has two main assumptions: firstly, that the individual writing style of every person is a bit different, and secondly, that works created shortly after one another are more similar than those written further apart. By calculating the frequency of use certain words – even those which carry very little meaning by themselves – and studying the recurrence patterns of certain linguistic structures, we can identify the author of a text and the period of time in which it was written.

There are many questions concerning this method. It steadily garners more and more attention, not only because of the results it brings, but mainly due to its impressively high accuracy. So let’s jump right into it.

‘At first it may seem very simple. When we make a list of most frequently used words for a certain set of texts and then compare those lists with one another, it’ll turn out that the most similar ones will have been written by the same author’, said Dr Rybicki. If we take into account the fact that about 50% of every longer piece of communication is composed of one hundred most common words in a language (like ‘and’, ‘but’, or ‘the’), such a comparison may incredibly accurately point to individual differences and allow for an easy identification. Despite their apparent triviality, these hundred common words are the most basic elements of every person’s writing style. It’s a phenomenon observable even in stylometry, which does not take into account things like context, syntax, grammar, and punctuation.

Who do these inconspicuous words determine the uniqueness of a writing style? ‘That we still don’t know. We often operate on the basis of a single word frequency, because it’s this aspect, not sentence length or punctuation, that allows us to pinpoint a text’s author, literary genre, or creation period. Grinding a literary work into dust made of words is, naturally, a bit unintuitive. We’re not absolutely sure why it works, but we have good evidence that it does’, explained Dr Choiński, who currently employs stylometry to study 18th century American literature.

The undoubted potential of stylometry, reflected by rapid and intense development of its tools, is definitely something that might increase our knowledge and cognitive perception.

‘It would be very interesting to see how word frequency changed throughout history, but the further we go back in time, the more difficult it gets’, added Dr Choiński. The JU researchers have asked their colleagues from the Yale University for help and received a sizeable text corpus comprising several thousand 18th century literary works.

Exhumation, but no post-mortem

Today, stylometric analysis helps to verify all kinds of speculations. Since it is based on very rudimentary issues, rooted in our individual thought processes, its results may go beyond classical linguistic interpretation. ‘There are research projects conducted by teams of psychiatrists and literary scholars who analysed the language of various books in order to see if they foreshadowed their author’s future suicide’, said Dr Rybicki.

This kind of studies, which can be classified as sociolinguistics, allow researchers to identify mental illnesses such as schizophrenia or bipolar disorders in writers. There is also the very interesting case of Agatha Christie. ‘Researchers have isolated  a set of changes in the language of a writer known to have suffered from Alzheimer’s. They observed a gradual decline in the diversity of words and an increase in the usage of terms with a very broad meaning. They applied this model to the works of Agatha Christie and, without removing her from her grave, confirmed that it was indeed Alzheimer’s that caused Christie’s death’, explained Dr Rybicki.

Another interesting aspect of stylometry is that it can be used to study many more types of texts than just literature. Naturally, the types of questions we may ask my be different, but the answers may still prove fascinating. ‘Obviously, no stylometrist I know thinks that reading will become obsolete and we will interact with books only through computers’, said Dr Rybicki.

Original text: www.nauka.uj.edu.pl

Recommended
Jagiellonian University in the QS World University Ranking by Subject 2024

Jagiellonian University in the QS World University Ranking by Subject 2024

Polish-French collaboration with the potential to revolutionise urology

Polish-French collaboration with the potential to revolutionise urology

JU researcher and students awarded with Fulbright scholarships

JU researcher and students awarded with Fulbright scholarships

Neural networks and AI to accelerate disease diagnostics

Neural networks and AI to accelerate disease diagnostics