Glad you could all come back for Day 2 of Harry Potter Week on Sporcle. Today we have one of those quizzes that like the original Harry Potter Top 200 contains trivia you likely can’t find anywhere else outside of Sporcle. One of our editors, Mrchewypoo (yes, he does wish he picked a better name when he signed up for an account) has gone through the painstaking task of counting every single word in each of the Harry Potter books and compiling it into a list of the top 300 most-used words in Harry Potter. Here’s what he wrote about his research
I have sacrificed a few weeks of my life for you, good reader. Meticulously extracted from digital versions of all seven books (they don’t make it easy, I assure you), I have snipped, sliced, studied, scrutinized and summarized all 1,083,673 words (by my humble count). Did you know that my version of Excel maxes out at 65,000 rows? I recently found that out. Here, I bring you the 300 most written words in the entire Harry Potter series by J.K. Rowling. I have the data for the remaining 25,096 unique words used in the books (by book, in case that’s interesting to anyone).
So here we are friends, this is one of the most epic of all Harry Potter quizzes on the internet. A giant, Hungarian Horntail-sized behemoth that will likely take you the full 20 minutes to decipher. Are you brave enough to take it on?
If you are, you might want to take on a few of our other giant Harry Potter quizzes including All About Harry Potter, Harry Potter Family Tree or even another one of Mrchewpoo’s creations: Tread Carefully: Harry Potter and the Philosophers Stone
Finally, we have the full explanation of MrChewyPoo’s madness after the jump.
The text is taken directly from the quiz comments:
I apologize for the rather long comment. If you’re not interested in the inner makings of this quiz, nor the defining moments of insanity, rate the comment down and move along.
Now, for those who remain… Converting .epub format to text was a challenge in itself. I will spare the explanation which involved a little bit of API programming with a simply amazing OCR program I use. Oh, it helps to have a bit of a technical background. I understand there are programs out there which will do it for you, but, as with all good things, most look either illegal, dubious or expensive.
It turns out converting to text was the easy part. Given the sheer volume of words involved here, once I obtained the text, I discovered I wasn’t at all sure what to do with it. First was to remove all punctuation (save the single apostrophe and single hyphen):
( ) ! ? , . ; : ” [ ] — …
For that I simply used Notepad (Windows) which was surprisingly quick and effective in dealing with the data. The one thing it couldn’t do was line breaks (paragraphs). Aye, there was the rub.
I needed to plug it into Excel so all words existed in a single column. I tried copy-and-pasting into a cell then text-to-columns (space-delimited) with the hopes of transposing to rows, but it ran out of room. Excel doesn’t handle more than 256 columns. Problem.
So I pasted the whole thing into Word which just about killed my system (I know, I need a new system) but it worked. I replaced all spaces with paragraph breaks, resulting in a 9,000 page Word document.
I then tried to copy and paste that from Word into Excel which actually did kill my system.
I’m still apologizing to the poor thing.
I almost gave up at this point. Pipe dream, I said.
Remembering how nicely Notepad played with me, I figured I’d try and copy-and-paste from Word to Notepad then from Notepad to Excel.
It worked a dream!
Well, nightmare actually. Excel only accepts 65,000 rows.
I should explain at this point that my goal was just the first book. That’s it. One book. 76,000 words, give or take. Easy peasy. So I just created two sheets, pasted and… and now what? Future planning is not a strong point of mine. I had two spreadsheets with 76,000 random words (give or take) and no idea how to summarize and count them. Very long story short, after careful consideration, much Googling and several beers, I made a third sheet, created a macro to read the other sheets and count the words. Great! 12 hours of work and done! I had my quiz!!
Something was missing
I was empty inside.
It wasn’t right
It felt cheap.
I needed ALL the books.
Crazy is a word reserved for the special. I do not consider myself special.
I am a masochist.
So, lather, rinse, repeat I did the same for the remaining six books. Some of those books are rather large (250,000+ words). Excel refuses to talk to me now and Word has run away with my CPU. Over one million written words (twenty-five thousand+ unique ones), a few cases of beer, a pissed off wife, a decent quiz and a few hopefully happy Sporclers later, you have witnessed the result.
I hope you enjoyed. Personally, I consider it an epic quiz.