Hi

I'm Chris.

It's time to move again. That means I've been browsing the Craigslist apartment and housing section more than I'd really like. Looking for housing in Vancouver is already about as fun as an airplane full of crying babies, so I didn't think it could get much worse. Since my girlfriend has a dog that she (and I) would love to have live with us, this meant checking the "allows dogs and cats" box on the Craigslist search form. As soon as you do this, you may notice that there are like 2 listings that meet that criteria. That sucks...but surely it can't just be relegated to Vancouver. After all, this is the place with an organic, free range, grass fed, fair trade dog food store on every block, so there's no way it can be unfriendly towards pets, right? People in Vancouver must really love dogs and cats.
Question: which cities are the friendliest towards pets?
I looked on the Seattle Craigslist and saw that there were plenty of ads that allowed pets, so I decided to take it a step further. I wrote a little script to look at the Craigslist apartment and housing listings for major cities. It put the listings into buckets based on date. It just compares the number of postings that allow pets to the total number of postings for that particular date. Then it averages the dates together, and you get the percentage of postings that allow pets, by day. Initially I wasn't expecting any significant differences between cities, but the results showed something else (damnit). Each city had about 2500 listings taken into account, and these are based on current (April 16, 2013) craigslist ads.
TL;DR : If you want to have a pet in Vancouver, then move to Seattle

New Site

I think my micro ec2 instance is going away soon, so after a bit of a migration, it lives here and looks different. Still need to copy all my old photos over. Maybe I'll update it more frequently, because now it's easy to deal with. We'll see.

CorpCrawl

As part of a project I'm working on in my free time, I needed to figure out corporate relationships. The SEC requires that all publicly held corporations file a list of their subsidiaries in their form 10K each year. So by scraping a section (called exhibit 21.1) in the 10k document, you can extract a list of subsidiaries from that registrant. The issue is that every company files their 10k in a different format, and lack of uniformity makes scraping a lot harder. Moreover, it says nothing about privately held companies. Anyway, I did my best and it manages to extract a lot of information.

I made the project lightweight and separate from any storage backend, so I should be able to easily integrate it back into the larger project I'm doing at a later date. Also, I was hoping that others might find it useful. It's a little bit out there though, so who knows.

It's up on Github here

Built with python

Aircooled Rescue

The aircooled VW community is pretty chill. There used to be a site that listed contact information of people willing to help out a travelling aircooled VW owner should there be a mishap on the road. The old site wasn't being maintained anymore, so I made my own. I scraped all the old info from the previous site, and made the new site accept registrations, rather than running each listing by the webmaster. Then all that info gets plotted on a map.

You can check it out here

Built with django, backbone.js, and bootstrap

Figured it would be possible to map the posts on reddit's earthporn subreddit by geocoding the post titles. Then you can see where the posts are geographically, which is nice for discovering pretty stuff around you.

Explore it here

Built with python

A Graph of Hubski

Thought it would be interesting to visualize the connections of Hubski in a different way. Though the graph should be directed, I wanted to keep it simple, so right now it’s undirected. Size represents the number of followers someone has. Let it settle for a couple seconds, then click the “Stop Layout” button. You can zoom with your mouse wheel. Mouse over a user to eliminate all users that aren’t directly following or followed by them.

Take a look at it here

Built with python and sigma.js

While reading a paper for class, I felt compelled to try my hand at implementing the approach they took. A lot of times I read things, they make some sense, but I don’t really know how much I don’t know about them until I stop reading and try doing. The paper is called Finding and Evaluating Community structure in networks, from 2003.

I read the paper a couple months ago, and the other day started thinking about all the uses for a means of picking out communities within a larger network. Basically, their paper says we should calculate the betweenness of each edge in a graph, and then remove those edges with the highest betweenness. If betweenness is a measure of how often an edge is crossed on a path for every pair of nodes in the graph, then we’ll be removing the edges that are most commonly crossed on a shortest path from node a to b. Eventually, the original graph is split up into smaller graphs, which, from their perspective, carry greater similarity between nodes.

So thinking about this in terms of a real community, I figured, two subreddits, a and b, are connected if a user has two comments c1 and c2 that live in a and b. So this constitutes an edge in the graph, where a node is a subreddit. I used the python reddit wrapper to pull some submissions and comments down, where I then constructed a graph. I figured it would be neat to evaluate this in large sets of data, so I committed the graph to a redis instance. When the data is downloaded, a python script loads the entire data set from redis and begins classifying (the fun part!) communities. The following occurs:

  • Calculate every shortest path for every pair of nodes in the graph
  • For each node in the graph, find the fraction of paths that contain that node vs how many don’t. This is betweenness (as per wikipedia’s definition)
  • Remove the edge with the highest betweenness (just the betweenness of that start and end nodes added together…maybe this assumption is flawed)
  • Repeat

So then I can draw it all in arbor.js

You can view one of the results here. I’m running the classifier on a much larger data set. It’s slow, because of the recalculation step of betweenness at each node removed, so maybe a method like this wouldn’t work as well in production where either you have speed requirements or massive data sets. (or, you know, just profile the code that I left un-profiled). The visualization is a physicsy thing, so if you wait a bit and let it settle, the communities will begin to repel each other, so you can see things better.

tl;dr: graphs and stuff. method of classifying subreddits based on users’ behavior.


Built with python and arbor.js