Defying Classification

by Malcolm Tredinnick

Sat 5 Aug 2006

Reusing Data: An Australian View

Posted at 21:00 +1000

In the background this week, I have been pulling together geographical and news data from a few sources for some experiments I want to try. If things work out, I'll probably show a few people what I am trying and the web would be the right way to do this. So I've been paying attention to republication restrictions and the like whenever I locate something that looks useful. Conclusion: the Australian government is not really leading the world in this sort of thing and Australian companies seem to follow suit.

Here's a survey of the good and not so good (plus you can play the "what is he doing?" guessing game, if you like).

NASA's Visible Earth website provides no end of data and pretty pictures as part of their Blue Marble project. I can download enormous files showing the planet at various times of the year. The raw data files for are 2GB each and uncompress to about 11 GB. My internet connection has a traffic cap, beyond which I pay an excess charge, so I only grabbed two files, six months apart. This data is in an easily usable format that I can convince the ogr/gdal library to use and hopefully it can then be passed on as a data source to MapServer as well. Terms of Use? Beautiful!

Other mapping data is similarly available, and not just for the US and Canada, which is what I was initially fearing.

Australia, though, lags behind in its sharing of data. Have a look at this cavaet in a collection of world elevation data. I particularly like the turn of phrase "AUSLIG does not give away copyright." Almost implicitly saying that organisations who do not require case-by-case approval for reuse do give away their copyright. Err .. no. Sure, it's possible to argue quite validly that it doesn't say that. So they are only guilty of treating the readers like idiots, since it is very normal not to give away copyright (although the US policy whereby a lot of government data is public domain is something I like).

To be fair, this is part of a general policy within Australia whereby republication permission has to be requested individually. I need to investigate at which point something becomes "news reporting", because that is one of the cases where general permission is already granted. In passing, weather data in Australia falls under the same restrictions. Historical weather data just starts to get silly: if it's not on the website, there's a charge for it. And almost nothing with historical details is on the Bureau of Meteorology's website. They provide summary stuff, but no details. Fortunate that I'm not needing climate data.

OK, so data about Australia that is somewhat paid for by my taxes needs a little work to collect. How does the media industry stack up, since they don't have a legislative right to my business and hence might be interested in playing nice? Again, I wanted a sample of data sources from around the world.

  • The local government broadcaster -- the ABC: a nice collection of RSS feeds, but doesn't look like you can redisplay headlines on a website (I would consider that going beyond personal use, since it's republishing).
  • One of my city papers: good collection, but only for use in news readers (hmm ... there's a poorly defined term, guys) and personal use again.
  • Similarly for one of the national papers (free clue: that URL is pretty non-memorable). Republishing of headlines: not so much. :-(

Off to sample from the overseas basket next. I'm going to have trouble dealing with non-English text, so my search isn't truly representative (remember I'm actually trying to accomplish something with this data, not doing a research project on terms and conditions).

  • The New York Time. Good selection. Clear link to terms and conditions, although it's only available as a Javascript popup rather than a link I can open easily in another tab, and the conditions aren't too bad. Written by somebody who doesn't understand the web, since I'm apparently not allowed to edit the data at all, but since their XML feed won't display directly as is, I don't think they intend for me to display it unchanged (and they do acknowledge it can be on a website). Little inconsistent, but definitely usable.
  • The Washington Post. No feed-specific permission information. The "Rights and Permissions" link at the bottom of the page leads to some pretty severe language, although a text link for headlines is allowed. Good enough for my purposes at the moment.
  • The BBC. This organisation are really standing out over the past couple of years as an oragnisation who get the internet and aren't afraid of sharing information, both content and explanations of how they do things (explore some of the links around this page for more war stories). Once again, the British national broadcaster comes through. The first paragraph of their terms and conditions for use makes it clear that republishing the information within the feeds is, within reason, encouraged.

I'd like to get a Canadian source as well, but I'm pretty clueless about their media, so it's taking a while and I keep ending up at provincial newspapers, rather than places with large national coverage. Plus, they don't seem to have discovered RSS or Atom yet, in many cases.

Conclusion

I consider myself somebody who is fairly aware of copyright restrictions. I don't agree with a number of restrictions, but I try to work within the system. I don't have a problem with fruits of labour being not just given away, either. If it costs something to produce and is genuinely useful, getting a return on the costs is what keeps a business going.

However, I don't like fake restrictions that set up artificial and unequal barriers to access. If I can access your web page and point anybody in the world to it, why can't I repost links and the brief summary you give in your RSS feed? It doesn't replace your content. If it's genuinely useful information, it will drive more (or at least, no less) traffic to your site. Isn't that your goal? Similarly, if the Commonwealth government (in Australia) is claiming they will grant republication permission in almost all cases, then a more blanket grant with exceptions specified would seem to be in order. I'm a huge supporter of the "government works for me. I help fund them; I help choose them" point of view. Failing to share is just impolite in that case.

Finally, one thing that doing this research brought home was the wide variation in understanding of the internet (and web, in particular) shown in various terms and conditions. From how easy it was to find the information, to how it was written and the default positions adopted, the standards swung wildly. And the BBC just have this stuff nailed! If you're thinking of starting a small country and want a model for your national broadcaster's website, look over there. Sure they have a television tax and crazy stuff like that so don't just copy the whole thing, but their website: so nice, so reasonable. I really need to start looking at their "positions available" page more often.

Topics: technology/data