Hi

I'm Chris.

Census Dataset Explorer

I heard about the US Census API and was looking around inside it. There are thousands of cool datasets within a well executed API, but it's pretty hard to navigate through because of the scale. I thought it would be cool if there was a way to organize them and maybe provide a simple visualization for each dataset. I created a thing called Census Explorer that attempts to do that. Hopefully it will make the datasets more accessible. Right now it consists of the SF1 and ACS5 datasets from the 2010 Census.

I pretty much just dumped all the dataset descriptions into an Elastic Search index, and provided a simple api for getting at it. The web interface is intentionally minimal, as this is really just an experiment-weekend project. The source lives here

Stuff used: flask, elasticsearch

Meaty links and Tunes

People have been sending a lot of good links over meatspaces. There are a lot of good tunes, sites, etc sent, so I made A simple Thing to listen to the meatspace chat and save the links. The source is here.

Update: Recently added a radio feature. The bot will pull out any soundcloud links or youtube music from the conversation and index it. If you click on the link in the top right of meatlinks, an embedded player will show up and will choose a random song to play, and then move onto the next. So it's a little shuffle based player that plays music that has previously been sent. Sometimes the category of a youtube song isn't set as "music" so to explicitly add a youtube link you can include "musicbot" in the meatspace message.

Stuff used: flask

After purchasing tumblr, it didn't take Yahoo long to start messing it up. Yahoo allegedly adopted some fairly strict filtering of content, so based on the content of your blog it could be blocked by the site's internal search as well as external (ie: google) searches. I built an application that crawls tumblr and builds a list of the unindexed blogs, and then made it searchable. To be honest, I don't really understand tumblr, and I can't tell the difference between spam and not-spam, since everyone is just reposting everyone else's (mostly nsfw) posts anyway. And Yahoo alleges that this is mostly a spam filtering strategy. Nevertheless, it was a policy change that got people worked up and it was sort of a fun sunday evening project. Take a look at it here...mostly NSFW though.

Stuff used: scrapy, django, backbone.js, elasticsearch

The folks at the Chicago Tribune built a load testing utility called Bees with Machine Guns which is a nice little tool that starts a number of EC2 instances, then hits a URL a bunch of times, and then shuts everything down after generating a report. It's a cheap, easy, and realistic way to load test your site. I've been using Digital Ocean for a few months now, mostly because it's cheaper, but they also provide a nice API, so I modified bees with machine guns to run on digital ocean rather than AWS. The project is called Minnows with machine guns.

VW Bus lights etc

I wanted to put a bunch of LEDs in the ceiling of the 73 VW bus I have. I finally put the arduino to use in this project. The arduino controls some (4) TLC5940 LED drivers, and an Android through an app I wrote communicates with the arduino via serial bluetooth adapter. There are 64 LEDs and each is addressable individually, and can be faded on or off. It's pretty basic but it was the most elegant way I could think of lighting up the inside of the bus. The end result was something I'm pretty happy with...it works well and looks pretty cool.

The result is here

I also wrote some stuff to measure the cylinder head temperature on the engine via the stock fuel injection temperature sensor. The temperature sensor is just a resistor that changes its resitance as the engine heats up. I found the following graph on the ratwell site and then fit it to get a function to convert the resistance to a temperature. The temperature displays on the phone app. In the future I'm going to be measuring the main and aux battery voltages, as well as the RPM via hall sensor.
In the future I might ditch the arduino for something faster like a raspberry pi, as there are some pretty annoying performance issues with generating tons of serial interrupts. However, the arduino is neat because it's very low power.

The stuff used:
Arduino
Android Phone or Tablet
Bluetooth module
TLC5940 (x4)
and some miscellaneous resistors, regulators, etc.

Resources I used:
TLC5940 Arduino Library (makes life so much easier)
On Android and Bluetooth
It's time to move again. That means I've been browsing the Craigslist apartment and housing section more than I'd really like. Looking for housing in Vancouver already blows, so I didn't think it could get much worse. Since my girlfriend has a dog that she (and I) would love to have live with us, this meant checking the "allows dogs and cats" box on the Craigslist search form. As soon as you do this, you may notice that there are like 2 listings that meet that criteria. That sucks...but surely it can't just be relegated to Vancouver. After all, this is the place with an organic, free range, grass fed, fair trade dog food store on every block, so there's no way it can be unfriendly towards pets, right? People in Vancouver must really love dogs and cats.
Question: which cities are the friendliest towards pets?
I looked on the Seattle Craigslist and saw that there were plenty of ads that allowed pets, so I decided to take it a step further. I wrote a little script to look at the Craigslist apartment and housing listings for major cities. It put the listings into buckets based on date. It just compares the number of postings that allow pets to the total number of postings for that particular date. Then it averages the dates together, and you get the percentage of postings that allow pets, by day. Initially I wasn't expecting any significant differences between cities, but the results showed something else (damnit). Each city had about 2500 listings taken into account, and these are based on current (April 16, 2013) craigslist ads.
TL;DR : If you want to have a pet in Vancouver, then move to Seattle

New Site

I think my micro ec2 instance is going away soon, so after a bit of a migration, it lives here and looks different. Still need to copy all my old photos over. Maybe I'll update it more frequently, because now it's easy to deal with. We'll see.

CorpCrawl

As part of a project I'm working on in my free time, I needed to figure out corporate relationships. The SEC requires that all publicly held corporations file a list of their subsidiaries in their form 10K each year. So by scraping a section (called exhibit 21.1) in the 10k document, you can extract a list of subsidiaries from that registrant. The issue is that every company files their 10k in a different format, and lack of uniformity makes scraping a lot harder. Moreover, it says nothing about privately held companies. Anyway, I did my best and it manages to extract a lot of information.

I made the project lightweight and separate from any storage backend, so I should be able to easily integrate it back into the larger project I'm doing at a later date. Also, I was hoping that others might find it useful. It's a little bit out there though, so who knows.

It's up on Github here

Built with python

Aircooled Rescue

The aircooled VW community is pretty chill. There used to be a site that listed contact information of people willing to help out a travelling aircooled VW owner should there be a mishap on the road. The old site wasn't being maintained anymore, so I made my own. I scraped all the old info from the previous site, and made the new site accept registrations, rather than running each listing by the webmaster. Then all that info gets plotted on a map.

You can check it out here

Built with django, backbone.js, and bootstrap

Figured it would be possible to map the posts on reddit's earthporn subreddit by geocoding the post titles. Then you can see where the posts are geographically, which is nice for discovering pretty stuff around you.

Explore it here

Built with python

A Graph of Hubski

Thought it would be interesting to visualize the connections of Hubski in a different way. Though the graph should be directed, I wanted to keep it simple, so right now it’s undirected. Size represents the number of followers someone has. Let it settle for a couple seconds, then click the “Stop Layout” button. You can zoom with your mouse wheel. Mouse over a user to eliminate all users that aren’t directly following or followed by them.

Take a look at it here

Built with python and sigma.js

While reading a paper for class, I felt compelled to try my hand at implementing the approach they took. A lot of times I read things, they make some sense, but I don’t really know how much I don’t know about them until I stop reading and try doing. The paper is called Finding and Evaluating Community structure in networks, from 2003.

I read the paper a couple months ago, and the other day started thinking about all the uses for a means of picking out communities within a larger network. Basically, their paper says we should calculate the betweenness of each edge in a graph, and then remove those edges with the highest betweenness. If betweenness is a measure of how often an edge is crossed on a path for every pair of nodes in the graph, then we’ll be removing the edges that are most commonly crossed on a shortest path from node a to b. Eventually, the original graph is split up into smaller graphs, which, from their perspective, carry greater similarity between nodes.

So thinking about this in terms of a real community, I figured, two subreddits, a and b, are connected if a user has two comments c1 and c2 that live in a and b. So this constitutes an edge in the graph, where a node is a subreddit. I used the python reddit wrapper to pull some submissions and comments down, where I then constructed a graph. I figured it would be neat to evaluate this in large sets of data, so I committed the graph to a redis instance. When the data is downloaded, a python script loads the entire data set from redis and begins classifying (the fun part!) communities. The following occurs:

  • Calculate every shortest path for every pair of nodes in the graph
  • For each node in the graph, find the fraction of paths that contain that node vs how many don’t. This is betweenness (as per wikipedia’s definition)
  • Remove the edge with the highest betweenness (just the betweenness of that start and end nodes added together…maybe this assumption is flawed)
  • Repeat

So then I can draw it all in arbor.js

You can view one of the results here. I’m running the classifier on a much larger data set. It’s slow, because of the recalculation step of betweenness at each node removed, so maybe a method like this wouldn’t work as well in production where either you have speed requirements or massive data sets. (or, you know, just profile the code that I left un-profiled). The visualization is a physicsy thing, so if you wait a bit and let it settle, the communities will begin to repel each other, so you can see things better.

tl;dr: graphs and stuff. method of classifying subreddits based on users’ behavior.


Built with python and arbor.js