The Web is about De-Commoditizing Data

If you are a developer of web content sites, then you must read Data and the Future of the Web by Scott Karp and Database Gods Bitch About MapReduce by Rich Skrenta. Scott provides the vision of where you need to go, and Rich provides an explanation of the new tools that are going to get you there.

Right now, most publishers provide commodity data (i.e. the same news that you can read on 1,000 other sites) without adding any value to either their users or their advertisers. As Scott notes, Google is the king of extracting commodity data. That has given them the power to also extract most of the revenue. But, there is another kind of data, the personal data that is created by a community of users on sites like Digg and Twitter: “it’s the data that’s still in our heads, the data that we have not put in digital form.” As Scott sees it:

“The future of the web will be determined by companies that can overcome people challenges — to bring EVERYONE’S data online, and make it useful. “

This is the primary challenge content producers face! How to mine the data their users provide them, in order to produce a better content experience that, in turn, provides more value to their users, and advertisers. The ability to do this will be the key to building a great content business in the web 2.0 era. And, it’s why I feel so strongly that content sites must embrace social media.

If Scott shows us the goal, Rich shows us the technical means to get there. Right now, most content producers have a database driven content management system (CMS), combined with a traffic reporting tool like Google Analytics. While this is perfectly good for serving content, and measuring your traffic, this combination will not allow you to do the kind of data analysis that will be needed in the future. The data is going to grow exponentially, and only a system based on technologies like mapreduce, HDFS, and Hypertable will allow your data analysis infrastructure to grow with it (at a cost you can afford).

Gathering increased amounts of data, and building the infrastructure that allows you to analyze and act on that data is the future of large scale content on the web. The only other alternative is content at an individual scale targeted at a niche audience (i.e a blog). At that personal level, the author can truly understand and respond to their audience. At any higher level, you need more, and the most successful publishers will be the ones who have the necessary tools.

1 Comment »

  1. Andrew Mager Said,

    January 19, 2008 @ 1:55 pm

    I think Facebook understands this, that’s why they are so successful. I never really thought about the value to the advertiser. It’s so true though.

RSS feed for comments on this post · TrackBack URI

Leave a Comment