Ming the Mechanic:
Standard Data Sources

The NewsLog of Flemming Funch
 Standard Data Sources2004-08-16 15:32
by Flemming Funch

I occasionally have the problem of trying to figure out which is the most authoritative source of some type of data, and that usually isn't easy, and not much automation is available.

So, for example, I'm adding a list of languages to a program. I'd like it to be a standard list, using standard codes. OK, I quickly find out that there is an international standard for that, ISO 639, which provides two or three-letter codes. And the authoritative site on it has a list, in an HTML table. Which, after a half hour of work I got imported into a database. It was obviously written by a human, with the cells having a bunch of different inconsistent formats. But why isn't this in a consistent XML format I can pick up automatically? What if this list changes, like when next week they decide there are really a couple of more languages that need to be on the list. It is doubtful I'll ever get around to importing it again the hard way, unless somebody has a problem. So sooner or later my data will be out of whack.

And then I need a thing for selecting timezones, so I can show time in people's local format. Where's the authority for that? I can find lots of places that list the different time zones. But no easy way of knowing when they have daylight savings time. The map of who uses what system across the world is surprisingly complex. Just see Canada. But the whole thing would really be a few kilobytes of data. I just want the correct data. I can find companies selling that, for $399 per year, but that's kind of silly. .... ah, a little more research shows that the Olson TZ database built into all Unix and Linux systems is a fine solution. It isn't authoritative, but it seems to be good, and gets updated once in a while, and it is already there. I kind of knew that, I had just forgotten. I'll use that. But, really, there should be one authoritative webservice somewhere I could just call. Manned by one employee in the UN or something, who'll call somebody in each region a couple of times per year and hear if they've changed their system, and who updates the database accordingly.

There are a lot of things one could do if more data were easily available as web services in authoriative normalized versions. Population, environment, geographical, financial data. If it were all available in standard ways, I could make my own analysis of what seems to be going on in the world. As it is right now, one has to put up with third hand questionable data, and it takes quite some financing to get somebody to normalize the data so one can do things with it.

[< Back] [Ming the Mechanic]



16 Aug 2004 @ 16:04 by swanny : Standards
I wrote a little piece on standards in my newslog ming
Its a bit wordy and possibly confusing.....
I was trying to make the point that standardization should
perhaps because of the internet be a more democratic and global
enterprise . People should sit down together and hash out
a kind of comprehensive democratic global standard for whatever,
that has the stipulation of a continuing review and reevaluation
and or restandardizaton. There are some organizations working on
it but I think it has to be a collaborative and perhaps dynamic thing.


28 Aug 2004 @ 09:50 by Gregory Wright @ : Normalizing Data
Here's a proposal for normalizing monetary and fiscal info and the social/political/etc. information and ideas related thereto:
All references to amounts of money in articles about inflation, the price of gasoline, the cost of living, government budgets, at least for the next five or more decades, should refer to Year-2000 Dollars, Year-2000 Euros (this currency's first year of existence, after all!), and so on. The only appropriate way to refer to money over time is with a constant unit of money, and the value of the main currencies in the still very recent millennial year is made to order for that!
(For example: Maybe the media hand-wringing over the allegedly high cost of gasoline in the U.S. these days, if it was consistently couched in Year-2000 Dollars, would diminish when folks realized that 1980s gas cost about $80 compared to today's sub-$50 stuff!)  

3 Sep 2004 @ 13:01 by ming : Normalizing Money
That would help, I guess. But money is still such a fluffy abstract thing. I'd love to be able to normalize it with something more real. Preferably in some way that includes all the side-effects and environmental factors.

Like, what is the cost of a barrel of oil? Is it the cost of getting it out the ground? Imagine we could note the real cost as the replacement cost. Like, what would it cost to put it back? Or, at least, what would it cost to manufacture a barrel of oil, if we couldn't just take it where it was lying around.

And there's of course the effects of cleaning up the results. What's the cost of the damaage that a barrel of oil produces? That should be part of the real cost.

Or, even within the realm of just dollars, how about if we normalized wages when looking at the numbers. A lot of things are really cheap because they're produced by people who earn $50 per month, and consumed by people who earn $5000 per month. Imagine you had to adjust it to man-hours, or, in that example, multiply the cost with 100 to make it match.

Prevalent economics work a bit like auto-theft. You know, a car is really cheap when you steal it. You don't have to worry about where you get the materials to manufacture it, and you don't either have to worry about producing the value you can exchange for the car. You don't have to worry either about who cleans up the mess after you run it off a cliff when you're done with it. You only have to weigh the trouble, the difficulty and the risks in stealing it against the rewards you expect from "owning" it. That's unfortunately way too much like how big business works. Where can we get away with getting the thing we want in the easiest possible way, which will give us the greatest possible profit when we use it or sell it elsewhere. And where the stuff really comes from, and where it goes afterwards is none of our problem, other than to the degree that some government might force us to deal with some of it.

If we were able to calculate what is going on in a more whole manner, including all factors, and, yes, normalizing the data so we can build it into a bigger picture - that could change a lot of things. Being able to see is a good start.  

Other stories in
2014-11-01 17:33: The conversation of work
2007-02-24 14:20: Writing books in HTML/CSS
2007-02-05 15:21: Software is hard
2006-11-19 21:30: Thingamy
2005-12-14 15:15: Ruby on Rails
2005-03-19 16:04: Comment and Refererrer Spam
2005-02-23 21:34: Wikipedia
2005-02-22 17:32: Mail
2005-02-10 16:00: More Google wizardry
2005-02-04 15:14: The Six Laws of the New Software

[< Back] [Ming the Mechanic] [PermaLink]? 

Link to this article as: http://ming.tv/flemming2.php/__show_article/_a000010-001342.htm
Main Page: ming.tv