Free access for print subscribers


An interdisciplinary team of researchers has created a Google
Trends-like tool called Porngram that maps the
evolution of keywords in the titles of 800,000 porn videos.

It allows users to enter in any number of keywords to see how
often they appeared in the titles of porn videos uploaded to porn
site Xhamster between 2007 and February 2013. It allows you to see
that movies tagged with “footjob” have been on the rise at the same
time as those tagged “handjob” have been falling. You can compare
your own keywords here. 

The Porngram tool was built off the back of the Sexualitics dataset, scraped
from Xhamster and Xvideos to become the subject of a research paper
into porn data analysis. The Xvideos dataset (which looked at
1,200,000 videos) lacked the upload date, meaning that it wasn’t
useful for offering up trend data over time.

The research team — made up of five individuals (Baptiste
Coulmont, Antoine Mazières, Mathieu Trachman, Jean-Philippe Cointet
and Christophe Prieur) with skills across computer science,
sociology, statistics, mathematics and gender studies — scraped the videos’
titles, tags, description, viewcount, comments, runtime, upload
date (if available) and uploader username using a custom-made
crawler. These were then analysed using a quantitative approach in
a bid to try and understand the classification of pornography and
shed light, to a certain degree, on human sexuality (at least from
the supply side).

Tags were sorted into categories (capturing variations of terms
such as “blowjob”, “bj” etc) and then these were ranked in terms of
the frequency of occurrence — how many videos have that particular
tag. The most popular five percent of keywords (including amateur
and blowjob) covered around 90 percent of videos and were therefore
not particularly helpful in terms of categorising content.

The research — outlined in this paper — reveals not only the number of times
a word occurs in the titles of porn movies over time, but the
keywords that are most popular (based on views) and the ones that
attract the most comments/reactions.

In order to eliminate categories that were “empty” in terms of
descriptive power (those that were applied so frequently that they
were meaningless), the team developed a nicheness score, an ad-hoc
statistical model for ranking the descriptive power of categories.
This meant that the overused terms were given lower scores.

Having done this, the researchers started to draw semantic
connections between keywords. For example the word “midget” is a
low-frequency category in the Xhamster database, but is present ten
times more than statistically expected in videos that also have the
tag “funny”. This indicates a strong relation between these two

Antoine Mazières, a PhD candidate at INRA-SenS and LIAFA told that the biggest challenge was making sense of the
data. “A bunch of skills were required spanning from statistical
physics to cultural anthropology.”

The most significant finding, in Mazières view, was how the data
highlighted the “huge diversity of sexual practices” while
“statistically relativising the overwhelmingness of mainstream

The data also revealed some surprises, including the fact that
37 out of the 100 most viewed videos on Xhamster have “mom” or
“mother” in the title. This category did not include the term MILF.
“I was really not expecting that one,” says Mazieres.

The research was limited by the fact that most of the data
associated with porn videos is kept by the platforms themselves.
Mazières and colleagues would like to gain access to server side
data, to analyse views of videos over time (rather than total views
of videos), for example, or to observe the “career” of a porn
consumer over time. “This could be done with anonymised data, of
course, using unique ID instead of IP, for example,” he told us.
“This would be a Kinsey 2.0

Having a better understanding of the demand side of porn could
be used to help inform the production of adult entertainment, much
in the way that Netflix mines its users to inform the content that
it commissions.

The team has made its dataset publicly available
so that others can play around with it.