News Visualizer - I Can Has Web

Published 2011-01-16, originally published at old blog

I’ve written about my master thesis in the previous post, and I’ve been thinking a lot of how to acquire the required skill-set to understand the topic in a good manner. I’ve undergone courses at IFI 3 years now, so I do have some theoretical understanding of themes that are often taught in computer studies, such as algorithms, data structures, computer architecture, and discrete mathematics amongst others. I believe that undertaking the courses INF3580, INF5120 and INF5100 (as well as special curriculum) will give me another layer of theoretical foundation that is required for this thesis.

But there is a field that the University of Oslo have no teaching to offer, and that is JavaScript (hereby shortened JS). To handle this, I must teach myself what is needed to know. I do already have some experience with JS, through work and a personal interest in the field that have stayed with me for quite some time. But I feel that the level of insight the thesis requires me to gain better knowledge of the matter, and have decided to develop a JS-based application.

The concept

I’ve always been fascinated by data visualization – the art of presenting data that makes use of our visual perceptions. I believe that rich data sets can be much more efficiently presented by visualizing it with concepts that users find easier to comprehend. After some thinking, I concluded that I want to try to visualize the amount of news presented in Google News in relation to their geographical metadata.

This is no easy undertaking. But luckily, there have been some work on this already, on the project newsmap by Marcos Weskamp and Dan Albritton. If I can understand how they dissect the data into regions, I believe the project to be possible. The next step would be to visualize the data in a proper manner.

Features

The main feature of this project will to show a world map (perhaps enabling the user to zoom in on continents, revealing countries, and perhaps enabling them to zoom down even further). The map will be populated by figures displaying the distribution of news by size. For example, if there would be more news from the US than from Europe, then the figure representing US would be bigger than Europe’s.

Also, I want to display the distribution by categories fronted by Google News, i.e. world, national, business, technology, sports, entertainment and health. I’m thinking that a pie-chart can show this appropriately. The overall size would be the amount of articles, and the size of each slice would represent the number of news in each category.

Another feature that would be awesome is showing the dimension of time, e.g. a time-lapse. I think to watch the development of the past days, or perhaps even chosen days where certain events happened, would be truly enlightening.

A personal note – Time

Before I go into the strategies, I must explain an important aspect in my life, which is time. If you don’t care, jump straight to design decisions.

Earlier I’ve written some thoughts on design decisions on the Aurora project. I gained some feedback that I want to take into account on developing this project, but more over I’ve gained the experience of trying to start something when not having the time. You see, the Aurora project is still in a starting phase. And a fundamental problem for me to start the lay the foundation is that I don’t feel I have the knowledge needed. Nor, to be honest, the time.

On another side-note, I’ve been developing a concept with a couple of friends of mine for some time. 9 months into the project, and still there had been no real development (we were still talking strategies and business concept). In the end I decided to end the collaboration, not because I didn’t have faith in the concept (I still believe the basic ideas to be really good), but because I didn’t want the project to take time in form of overhead and the anticipation.

Time, I’ve concluded, is perhaps my most precious resource.

In relation to this, I’ve been trying to structure my time. In the fall that were, I tried to conceptualize this with the title 40/40 – 40% on my professional work and 40 points in courses (which actually amounts to 133% work, summing my workload to about 174% to that which is normal). There were incentives that drove me into that workload, but having done that for the past 6 months, I’ve discovered that conceptualizing structures in this way is a really good thing for me. It expresses my choices to people, and lets them know how I wish to prioritize.

I haven’t conceptualized this spring yet, but the content it should express is 40% on my professional work, 35 points in courses, and a 3-week long vacation in February. Also, this includes me involving myself in some big projects, like partaking in the celebration of the University turning 200 years, which include a party September 2. with involve a LOT of people.

To give myself some time to research this project, I’ve decided to start developing (that is, write code) after I return from Thailand on February 22.

Design decisions

To create a solid foundations, I’ve decided some strategies for the project which I’ll adhere to.

Documentation-driven design

I read about this concept last years 24 ways, and was really fascinated by the motivations behind it. This requires me to learn JsDoc Toolkit, which I think will be really interesting.

Test-driven development

In Aurora we decided to not use the features of TDD, a decision which I’ve questioned myself the most. In this project I’m turning around, reading Christian Johansen‘s Test-Driven JavaScript Development and using Google’s JsTestDriver.

JS required

Since this project is first and foremost about me learning JS, it will require support of JS. That means I will not adhere to the principles of progressive enhancement.

Don’t reinvent the wheel

There are many resources out there that probably have a place in this project. The first and foremost is the use of maps, which I will include through the Google Maps JavaScript API.

Also, As I’ve mentioned, the project newsmap have already done some of what I wish to accomplished (although in Flash), so I should send an email to the developer asking if they can contribute me with some tips.

Do Good Things!

This covers a lot of things, of course, but my meaning of this is following guidelines such as those described in JavaScript: The Good Parts.

Implementation

There is much yet to be decided, but I’ve been thinking about tools that would be appropriate for this project (besides the ones mentioned above).

Flot: Interesting library that produces graphical plots; could be interesting to see how it works, and if perhaps I can use or copy and modify some of its functionalities.
jQuery-SPARQL: An interesting attempt to work with data using SPARQL – may have a use in this project
Protovis: Perhaps not suitable for my need in visualizing data, I think it can be interesting to dissect it to see how it works.
Raphaël: This project will require working with canvas, a field in which I’m ignorant at the moment. I’ve heard good things about Raphaël, and will do more research whether this is an appropriate tool.
rdfQuery: A tool that I believe can be used to work with the data from sources such as GeoNames.

Written by Arne Hassel