May 12, 2016
A couple days ago, I was thinking how it had been a while since I’ve made a new viz and I thought I’d head over to /r/datasets and see if I could find something interesting. What I ended up finding was the dataset of my dreams.
This dataset was compiled by some researchers from Denmark. It contains information on over 68k users and their question answers*. It’s pretty hefty and I’m still digging into it, but I wanted to throw something fun up here before I spend too much time falling down the rabbit hole. OkCupid is an incredibly rich source of data, as evidenced by their own data blog. Just to whet your appetite of things to come from this amazing dataset, I’ve made this exploratory viz to let you compare personality traits.
The main technology that drives OkCupid is it’s matching algorithm. It’s based on questions it asks you in which you choose your answer and how’d you like the other person to answer. These questions are all broken up into categories and also used to generate scores for different “personality traits.” For those who are curious here’s most of mine, minus some less safe for work ones. 😉
On that note, here’s the viz! More to come, I’m sure.
*Update: There’s been some controversy over the ethics of this dataset. The authors have since removed it from the linked website. I had already removed the user name column from the dataset because it was extraneous and I didn’t need it. I’ve now also updated my viz to not include as much potentially identifying information such as location. I don’t feel that looking at this data without that stuff is unethical, but if you have thoughts on the matter, I’d love to hear them.
February 8, 2016
Make it rain (data)!
Click here to view the datasets.
June 2, 2015
Stumbled upon a pretty fantastic group of Airbnb datasets for Amsterdam, Barcelona, London, NYC, Paris, Portland, San Francisco, and Sydney. You can find them here. Looks like things are spread across a few different tables so some join/blend action will probably be necessary. But on first glance, they look pretty robust. Enjoy!
March 25, 2015
Someone on Reddit has compiled all the aggressive actions in the Harry Potter series into a handy spreadsheet and made some well-meaning but overall lackluster pie charts out of it.
Professor McGonagall is unimpressed.
Here’s the data! Do something awesome!
February 9, 2015
Backblaze.com has released a huge dataset of failure rates and other hard drive stats for over 41,000 hard drives. Check it out here!
September 17, 2014
If you were at Pimp My Viz, or at least read yesterday’s summary about it, then you may recall that I spoke a little bit about using the Spotify Web API to pull data into Google Spreadsheets. My partner (and author of this excellent tutorial about bring JSON into Google Spreadsheets) alerted me to the existence of these New York Times APIs yesterday. They have APIs for a number of different things but the ones I think you could get a LOT of interesting data out of are the Best Sellers API, Campaign Finance API, Congress API, Movie Reviews API, and Real Estate API.
Now go forth and do awesome things with data!
July 30, 2014
I found this link to the Beer Institute Brewer’s Almanac through a post of excellent vizzes from Data Knight Rises. There’s definitely a lot of info here, and while it may not all connect with each other directly, making dashboards difficult, it would probably do well with story points.
Get the data here.
July 1, 2014
Y’all know how much I love datasets about sex, drugs, and rock and roll. This dataset was just released by the CDC titled “Variation Among States in Prescribing of Opioid Pain Relievers and Benzodiazepines — United States, 2012”
It could be interesting to mashup this data with figures on opiate drug abuse or number of people on disability. Have at it, vizzers!
April 30, 2014
This dataset comes from Lift, an app that helps you succeed at everything. The did an experiment to see which popular diets were the most effective. You can learn more about the study and download the data here.
April 28, 2014
Here’s a fun dataset looking at swearing in rap lyrics from 1985-2013.
It comes from a collection of datasets, mostly about sports, that are all formatted pretty nicely and look nice and ready to go to Tableau.