I told you I’d be back with more Discover Weekly analysis and here I am! I was inspired to take another look after seeing this great story that Owen Wang, a brilliant intern on the Storytelling dev team this summer, created. He found a tool provided by the Echonest that analyzes your different Spotify playlists based on a number of attributes. He explains it beautifully below, so check it out (it’s just a little wide on my blog, so you might want to open it in a new tab):
I’ve been saving all my Spotify Discover Weekly playlists with the intent of eventually doing something with them, and this tool from Echonest seemed like a great excuse to do some vizzing! After seeing some mildly interesting results about what Spotify recommends I listen to, I thought it’d be fun to compare the playlists to my actual music listening to see how often they got it right. Check out my viz below to see what I found. I decided to do it as one long scrolling viz since interaction between the charts is minimal and I was really itching to try the device designer. Enjoy it! And if you want a soundtrack as you peruse it, here is a gigantic playlist of ALL my Discover Weekly playlists combined into one:
Go. When thinking through what the requirements should be I came up with this list:
Data should be able to be added via phone, since I will already have it in hand when playing Pokemon Go
Geolocation is must have. It’d be great if whatever app I use took it down automatically.
I should be able to customize what data I input.
Inputting data should be quick enough that doing it after catching every pokemon isn’t such a chore that I skip doing it because I’m lazy.
I chose to go with an app I’ve used before for Quantified Self purposes, Nicholas Felton’s Reporter app for iPhone. The app has some really handy features like automatically geo-tagging reports, as well as adding step, weather, and photo data. You can totally customize what the questions the report asks and what kinds of answers you can give.
Initially, I set up questions to input which pokemon I caught and what their CP was. And, since it was just about time for my lunch break, I figured gong on a little walk and testing out my data collection couldn’t hurt. I’m glad I did because testing out my data collection process in the field helped me iterate on it and figure out what questions mattered.
As I walked, I hit a couple of Pokestops with lure modules on them. I was able to catch quite a few pokemon around them. I realized, that looking at the data, you would see a cluster of catches in this time period and might wonder why there were so many. So, I decided to add a question for if there was a lure module close by. At the same time, I realized that incense would have the similar effect, so I added a question for that, too.
I lure my pokemon like I do my interns, with lots of dranks and hella noms.
I continued my walk and hit another Pokestop where a Slowpoke was hanging out. I caught him and apparently his slowness spread to my phone, because upon catching him, my game froze. I’d estimate that around 40% of the time, the game freezes on me after I catch a pokemon. I’d like to be able to know an accurate percentage for that number. So, I added another question for if it froze or not while catching the pokemon.
“Wait….so did I actually catch that or nah?”
Going out in the real world and testing my data collection process helped me iterate on it and improve it. I was able to catch missing data earlier and it’ll lead to a more accurate dataset. The Reporter app makes it pretty easy to add new questions, so this whole experience really verified that it’s a good tool for the job. Unfortunately, Reporter is only for iOS, so if you have a suggestion for an app Android users could use, I’d love to hear it!
A couple days ago, I was thinking how it had been a while since I’ve made a new viz and I thought I’d head over to /r/datasets and see if I could find something interesting. What I ended up finding was the dataset of my dreams.
This dataset was compiled by some researchers from Denmark. It contains information on over 68k users and their question answers*. It’s pretty hefty and I’m still digging into it, but I wanted to throw something fun up here before I spend too much time falling down the rabbit hole. OkCupid is an incredibly rich source of data, as evidenced by their own data blog. Just to whet your appetite of things to come from this amazing dataset, I’ve made this exploratory viz to let you compare personality traits.
The main technology that drives OkCupid is it’s matching algorithm. It’s based on questions it asks you in which you choose your answer and how’d you like the other person to answer. These questions are all broken up into categories and also used to generate scores for different “personality traits.” For those who are curious here’s most of mine, minus some less safe for work ones. 😉
On that note, here’s the viz! More to come, I’m sure.
*Update: There’s been some controversy over the ethics of this dataset. The authors have since removed it from the linked website. I had already removed the user name column from the dataset because it was extraneous and I didn’t need it. I’ve now also updated my viz to not include as much potentially identifying information such as location. I don’t feel that looking at this data without that stuff is unethical, but if you have thoughts on the matter, I’d love to hear them.
If you are a fan of my blog, I’m going to bet you are also a fan of Peter Gilks’ blog and saw this excellent viz looking at his Last.fm/Spotify data. He does a great job laying out how to keep/get this data for yourself. And I liked his viz so much that I decided to rip it off entirely, down to the little pic of me wearing headphones:
Peter had some other cool visualizations analyzing his taste. He had some nice ones about genre, which I would totally do if I didn’t have 3,520 distinct artists to categorize. I did have some success using import.io to scrape Allmusic.com to get that information for my Festival Finder viz last year, so maybe if I have time I’ll take a crack at doing that. One thing I did finally learn a little more about is LOD calculations, which I used to find when the first/last date that I played an artist was. With that I was able to make this cool gantt chart of my Top 20 artists. Check out how one day of mourning David Bowie was enough to put him in my Top 20 for Q1.
I’ve been using Last.fm since 2006, so I have nearly a decade’s worth of music listening data to look through. I could probably create a whole blog just on my personal music listening habits, but I doubt that anyone would be all that interested in that besides me. However, I did start to scratch the surface of an interesting. About a year ago, Spotify started making these weekly curated playlists for their users called “Discover Weekly.” Spotify uses all kinds of sophisticated recommendation engines to determine what to add to each user’s individualized playlists. After giving them a try for a while, I learned that Spotify’s robots KNOW THEIR SHIT. So, I made the Discover Weekly playlist a part of my music listening routine. As such, I’ve seen a bit of a jump in the number of new-to-me artists I listen to, especially on Mondays, when the playlist comes out. Check it out in the story below:
This is just scratching the surface of the kinds of analysis I want to do about Discover Weekly. Coming up, I’m going to see how often they get things right, if what I’m currently listening to has an effect on what Spotify recommends me, and how much of their playlists are actual new artists to me and not just tracks I don’t listen to as much from artists I already love.
It was over three years ago that I joined the Tableau Public team. I was just a fresh-faced kid fleeing a terrible, abusive first job, ready to take the data world by storm.
I didn’t know back then how much I would fall in love. With Tableau. With dataviz. With the wonderful community that we’ve built together. Together, you’ve helped take me from spending 2 days building this:
to spending 2 hours building this:
I feel honored and blessed to be a part of all it. I’ve worked with some amazingly talented, brilliant, creative, and just all around good people, both inside and outside of the Tableau Public team. I’ve learned so much from all of you. And while, I’ve greatly enjoyed helping grow this community and evangelizing Tableau Public, the time has come for a new challenge. I’m leaving the Tableau Public team.
Luckily, I’m not moving far. I’m staying with Tableau and starting a new role in our development organization as a Product Manager! I will be helping our team develop and improve Tableau’s storytelling features. And while it’s hard to leave behind my wonderful team of brilliant Tableau Publicans, I’m incredibly excited to be able to help make the product that I’ve grown to care for so much even better.
As I make this exciting move, I just want to say thank you to the community, to my team, and especially to Ben Jones. He’s really been the most wonderful and supportive manager and mentor I could ever dream of. It’s been an honor being your right-hand woman for the last 3 years and I’m going to miss working with you every day immensely.
I just want to reassure you, my lovely readers, that this does NOT mean I will stop vizzing, blogging, tweeting, or participating in the community. I plan to stay just as active, even as I make the transition from Marketing to Development. And yes, I will still be running Iron Viz, because you will have to pry that bell pepper out of my cold, dead hands.
In case you haven’t heard, a great man and artist passed away last night, just days after his 69th birthday. That man was David Fucking Bowie. I’m devastated. And I’m mourning the only way I know how, with data. I threw together this visualization where you can track David Bowie’s tour history. I’ve also tried to hunt down a video from every tour on Youtube, so clicking on the tours in the bar chart below should open those up. I’m a little frustrated with the Path maps: it’s breaking up the paths by country. It also is showing a “*” for a lot of the dates and venues for when Mr. Bowie played a city multiple times, but adding those to the detail shelf messes up the path map even further. I’m guessing the solution is something having to do with LOD calcs, but I’m still experimenting. But, I wanted to get this out there so we can all start to heal.
UPDATE: My brilliant co-worker Richard Wesley had a super simple solution to my path map woes. Using Max([Country]) instead of [Country] still preserves all of the cities in places while also not using it as a dimension to divvy up the data. Thanks dude!!!!
Obviously, I like blogging about Tableau. I’ve held multiple panels on it, written articles, basically professed my love for it at every public opportunity I could get. However, there’s a lot not to love about it. Like:
Having to build a website
Having to fight spam
Feeling pressured to keep it updated
Spending hours editing blog posts
Tinkering with my WordPress theme only to get frustrated eventually
Trying to get new readers when no one really reads personal blogs anymore
Sure, I’ve had some success posting blog posts to Reddit every once in while, but the way people use the internet is different now. People want all of their content coming to them through one source, aggregators like Reddit, social media platforms like Facebook or Twitter, or curated content platforms like Medium. Content creators don’t want to deal with the hassle of maintaining a blog. And having a dedicated blog is a big commitment if you only have a couple ideas of things you want to make and talk about.
I still get a healthy amount of blog traffic from the Tableau community. But a big goal of my Pop Viz work (and my work on Tableau Public, in general) is to improve data literacy among the general populous. When I’m doing a Pop Viz post on something like music or Pokemon or whatever, I want fans of that thing to consume the viz. But fans of those things are confused when they come to my Tableau blog and there’s all this other weird content about graphs and data and what have you. To reach them, it’d be better to have these vizzes as standalone projects. I could always just send them to the viz homepage, but then I can’t add all the context that I want to. So, what should I do?
I’m experimenting with a new platform to solve just this problem. I want to make beautiful, responsive, Medium-like articles with data visualizations. Medium doesn’t allow for Tableau Public, but I found a tool that does. It’s called Atavist. It has a drag-and-drop interface to add different “Blocks” of content. One of the blocks you can add is and Embed code, and Tableau Public vizzes work there.
I’ve tested out this platform by making a little article about an iPhone game that all the women in my family have been obsessed with called “Neko Atsume”. So here it is, my article “Data Atsume”. Check it out and let me know what you think about this format! I think I might use it for projects like this where I’m making a bunch of vizzes on a very particular dataset and I want people to explore it. If any of you out there have avoided starting a blog for your data projects because of any of the pain points I listed at the start of this post, I encourage you to try to make a data story with this tool. And be sure to tweet it to me @jeweloree!
For the past couple weeks, every time I’m making small talk with someone and they mention their excitement for the new Star Wars movie, I quickly segue into “OMG. HAVE YOU HEARD THE DARTH JAR JAR THEORY?” Judging by some data analysis of Reddit that I’ve done; I’m not the only nerd doing this.
For those of a slightly less geeky nature, Jar Jar Binks is an incredibly unpopular character from Star Wars Episode 1: The Phantom Menace. As the title implies, someone in the film is supposed to be an unexpected bad guy. But, we never really find out who that is. By the end, we assume it’s Emperor Palpatine. However, some Star Wars super fans have posited that the Phantom Menace is an even more unexpected character: Jar Jar Binks. Check out this video to see the whole theory:
This theory really rose to prominence in a Reddit post last October. In the dashboard below, I used import.io to scrape Reddit for any post containing “Jar Jar Binks”. You can see that prior to the theory becoming mainstream, the only people that routinely talked about Jar Jar were in the subreddit /r/whowouldwin, which pits all kinds of characters against each other and people debate who would win in that fight. After the theory, a whole subreddit on it /r/DarthJarJar because a popular place for people to suggest and discuss evidence for the theory of Jar Jar being a Sith lord.