Fun fact: A sequence of symbols used to represent a profane word is called a ‘grawlix’.
It’s tough to admit, but sometimes at Sporcle we make mistakes. Yeah, who would have thought? However, recently we made a mistake that was so interesting and trivia-related that we had to share the details.
Here’s a little background:
Like many sites, Sporcle uses a word filter we dubbed the ‘naughty list’ for words that we think are inappropriate for usernames and quiz content. Matt and Derek came up with it back when Sporcle HQ was located in whatever coffee shop would let the two of them stay the longest. That first naughty list was born of a discussion in which they publicly discussed each word out loud in their temporary ‘office’ before wondering if, perhaps, they were pushing the limits of common decency. Awkward!
As the site grew, we added more features (Comments! Clickable quizzes!), and updates to the naughty list didn’t always apply to every new feature. A couple weeks back, we decided to change that, and apply the same rules across the board. Unfortunately, we soon found ourselves in an epic battle with something known as ‘The Scunthorpe Problem‘, along with a couple un-named problems we invented by ourselves.
The Scunthorpe Problem is an amazing computer science issue that we’d never heard of until this week, but it’s something we’d been dealing with since those early coffee shop days. It was first named in 1996 from an incident that prevented residents of Scunthorpe, UK from creating accounts on AOL. It continues to pop up whenever filters on the Internet incorrectly flag words that contain a string that matches a word on the banned list. You might never recognize that Scunthorpe contains a dirty word right in the middle, but when it appears on Sporcle as S****horpe or is completely blocked from being entered at all, it actually serves to highlight the banned word, instead of diminishing its use.
Here are a few other fun examples from that Wikipedia article linked above:
• In the months leading up to January 1996, some web searches for Super Bowl XXX were being filtered, because the Roman numeral for the game and the site (XXX) is also used to identify pornography.
• The filter of the free wireless service of the town of Whakatane in New Zealand blocked searches involving the town’s own name, because the phonetic analysis used by the filter deemed the “whak” to sound like the f-word. The town name is Maori, and in the Maori language ”wh” is most commonly pronounced as “f”.
• Gareth Roelofse noted in 2004, “We found many library Net stations, school networks and Internet cafes block sites with the word ‘sex’ in the domain name. This was a challenge for RomansInSussex.co.uk because its target audience is school children.”
• In July 2011, web searches in China on the name Jiang were blocked following claims on the Sina Weibo microblogging site that former president Jiang Zemin had died. Since the word “Jiang” meaning “river” is written with the same Chinese character, searches related to rivers including the Yangtze (Cháng Jiāng) produced the message “According to the relevant laws, regulations and policies, the results of this search cannot be displayed.”
In our own mini-Scunthorpe problem, we not only ran into a problem of censoring incorrect words, but also our code for determining a correct answer was actually flagging any answer that contained a naughty word as incorrect, which served to make the issue even worse. Not only were we filtering innocent words, but correct answers as well!
In short, we screwed up in ways both strange, and legendary, and we truly apologize to any user who was effected by any of these shenanigans in the last week.
At the end of the day, we did learn a few things about handling these issues in the future, some really interesting trivia involving a city in North Lincolnshire, England and a good lesson about the sometimes unsteady balance between censorship and community on the Internet. Not a terrible trade for a couple silly $#@$! mistakes.