Pop Viz: Comparing Personality Traits on OkCupid

May 12, 2016

A couple days ago, I was thinking how it had been a while since I’ve made a new viz and I thought I’d head over to /r/datasets and see if I could find something interesting. What I ended up finding was the dataset of my dreams.

This dataset was compiled by some researchers from Denmark. It contains information on over 68k users and their question answers*. It’s pretty hefty and I’m still digging into it, but I wanted to throw something fun up here before I spend too much time falling down the rabbit hole. OkCupid is an incredibly rich source of data, as evidenced by their own data blog. Just to whet your appetite of things to come from this amazing dataset, I’ve made this exploratory viz to let you compare personality traits.

The main technology that drives OkCupid is it’s matching algorithm. It’s based on questions it asks you in which you choose your answer and how’d you like the other person to answer. These questions are all broken up into categories and also used to generate scores for different “personality traits.” For those who are curious here’s most of mine, minus some less safe for work ones. đŸ˜‰

My OkCupid Personality traits

On that note, here’s the viz! More to come, I’m sure.

*Update: There’s been some controversy over the ethics of this dataset. The authors have since removed it from the linked website. I had already removed the user name column from the dataset because it was extraneous and I didn’t need it. I’ve now also updated my viz to not include as much potentially identifying information such as location. I don’t feel that looking at this data without that stuff is unethical, but if you have thoughts on the matter, I’d love to hear them.

Data Feed: 757 .csv Datasets

February 8, 2016

Make it rain (data)!

Click here to view the datasets.

Data Feed: Inside Airbnb

June 2, 2015

Stumbled upon a pretty fantastic group of Airbnb datasets for Amsterdam, Barcelona, London, NYC, Paris, Portland, San Francisco, and Sydney. You can find them here. Looks like things are spread across a few different tables so some join/blend action will probably be necessary. But on first glance, they look pretty robust. Enjoy!

Data Feed: Aggressive Actions in Harry Potter

March 25, 2015

Accio data!

Someone on Reddit has compiled all the aggressive actions in the Harry Potter series into a handy spreadsheet and made some well-meaning but overall lackluster pie charts out of it.


Professor McGonagall

Professor McGonagall is unimpressed.


Here’s the data! Do something awesome!

Data Feed: Reliability data for 41000 hard drives

February 9, 2015

Backblaze.com has released a huge dataset of failure rates and other hard drive stats for over 41,000 hard drives. Check it out here!

Data Feed: New York Times APIs

September 17, 2014

If you were at Pimp My Viz, or at least read yesterday’s summary about it, then you may recall that I spoke a little bit about using the Spotify Web API to pull data into Google Spreadsheets. My partner (and author of this excellent tutorial about bring JSON into Google Spreadsheets) alerted me to the existence of these New York Times APIs yesterday. They have APIs for a number of different things but the ones I think you could get a LOT of interesting data out of are the Best Sellers API, Campaign Finance API, Congress API, Movie Reviews API, and Real Estate API.

Now go forth and do awesome things with data!

No, not that data!

Data Feed: Brewer’s Almanac

July 30, 2014

I found this link to the Beer Institute Brewer’s Almanac through a post of excellent vizzes from Data Knight Rises. There’s definitely a lot of info here, and while it may not all connect with each other directly, making dashboards difficult, it would probably do well with story points.

Get the data here.

Data Feed: Prescribing of Opioid Pain Relievers by State

July 1, 2014

Y’all know how much I love datasets about sex, drugs, and rock and roll. This dataset was just released by the CDC titled “Variation Among States in Prescribing of Opioid Pain Relievers and Benzodiazepines — United States, 2012”

I choo-choo-choose you, vicodin!

It could be interesting to mashup this data with figures on opiate drug abuse or number of people on disability. Have at it, vizzers!

Data Feed: Effectiveness of popular diets

April 30, 2014

This dataset comes from Lift, an app that helps you succeed at everything. The did an experiment to see which popular diets were the most effective. You can learn more about the study and download the data here.

BTW it's the 10th anniversary of Mean Girls today!

Data Feed: Cursing in Rap Lyrics

April 28, 2014

Here’s a fun dataset looking at swearing in rap lyrics from 1985-2013.

best rapper ever

It comes from a collection of datasets, mostly about sports, that are all formatted pretty nicely and look nice and ready to go to Tableau.

← Previous Page