Sunday, September 30, 2012


Reading mathbabe's blog, I learnt about Kaggle. This is a site that provides data so that data analysts can put their techniques to work. This data is provided by institutions who are interested in having their data analyzed. Then, the data is analyzed in the form of a contest, and he who gets the best results by the time the competition ends, wins a prized paid by the data owner. This is best summarized by its Wikipedia's page:
  1. The competition host prepares the data and a description of the problem. Kaggle offers a consulting service which can help the host do this, as well as frame the competition, anonymize the data, and integrate the winning model into their operations.
  2. Participants experiment with different techniques and compete against each other to produce the best models. For most competitions, submissions are scored immediately (based on their predictive accuracy relative to a hidden solution file) and summarized on a live leaderboard.
  3. After the deadline passes, the competition host pays the prize money in exchange for "a worldwide, perpetual, irrevocable and royalty free license [...] to use the winning Entry", i.e. the algorithm, software and related intellectual property developed, which is "non-exclusive unless otherwise specifies

I have seen interesting designs and topics. It is, no doubt, a very interesting source and might even provide a good topic for a thesis, from the practical point of view. I have yet to study it more deeply.

