Flemming Funch's Weblog: Ming the Mechanic

Ming the Mechanic - Category: Programming
An old rigid civilization is reluctantly dying. Something new, open, free and exciting is waking up.

Tuesday, August 17, 2004

Microcontent

"Microcontent" seems to be one of the buzzwords now. So, what is that, really?

Jakob Nielsen, interface guru, used it (first?) in 1998 about stuff like titles, headlines and subject lines. The idea being that first you might see just a clickable title, or a subject line of an e-mail, that you then might or might not decide to open. So, that title needs to be representative of the full thing, or you might not click it, or you'll be disappointed when you do. Microcontent (the title) needs to match macrocontent (the page, e-mail, article).

Now, that doesn't quite seem to be how "microcontent" is used nowadays. OK, on to 2002, Anil Dash says this, talking about a client for microcontent:

Microcontent is information published in short form, with its length dictated by the constraint of a single main topic and by the physical and technical limitations of the software and devices that we use to view digital content today. We've discovered in the last few years that navigating the web in meme-sized chunks is the natural idiom of the Internet. So it's time to create a tool that's designed for the job of viewing, managing, and publishing microcontent. This tool is the microcontent client. For the purposes of this application, we're not talking about microcontent in the strict Jakob Nielsen definition that's now a few years old, which focused on making documents easy to skim.

Today, microcontent is being used as a more general term indicating content that conveys one primary idea or concept, is accessible through a single definitive URL or permalink, and is appropriately written and formatted for presentation in email clients, web browsers, or on handheld devices as needed. A day's weather forcast, the arrival and departure times for an airplane flight, an abstract from a long publication, or a single instant message can all be examples of microcontent.

Oh, and an absolutely excellent article it is. It calls for the building of a client, a program that will allow us to consume and create microcontent easily. Not just aggregate it, but allow us to use it in meaningful ways. I.e. seeing the information how we want to see it, without having to put up with different sites' different user interface quirks. Good examples he gives at the time is Sherlock or Watson on Macs. You can browse pictures, movies, flight schedules, ebay auctions and more, all from the same interface, and without having to go to the sites they actually come from. But we're still not quite talking open standards for all that.

What is needed is the semantic web, of course. Where all content has a uniform format, and is flagged with pieces of meaning that can be accessed and collected by machines. Isn't there yet. Many smart people are playing with pieces of it, like Jon Udell, or Sam Ruby. Or, look at Syncato. All stuff mostly for hardcore techies at this point. But the target is of course to eventually let regular people easily do what they find meaningful with any data that's available on the net.
[ Programming | 2004-08-17 12:53 | 5 comments | PermaLink ] More >

Monday, August 16, 2004

Standard Data Sources

I occasionally have the problem of trying to figure out which is the most authoritative source of some type of data, and that usually isn't easy, and not much automation is available.

So, for example, I'm adding a list of languages to a program. I'd like it to be a standard list, using standard codes. OK, I quickly find out that there is an international standard for that, ISO 639, which provides two or three-letter codes. And the authoritative site on it has a list, in an HTML table. Which, after a half hour of work I got imported into a database. It was obviously written by a human, with the cells having a bunch of different inconsistent formats. But why isn't this in a consistent XML format I can pick up automatically? What if this list changes, like when next week they decide there are really a couple of more languages that need to be on the list. It is doubtful I'll ever get around to importing it again the hard way, unless somebody has a problem. So sooner or later my data will be out of whack.

And then I need a thing for selecting timezones, so I can show time in people's local format. Where's the authority for that? I can find lots of places that list the different time zones. But no easy way of knowing when they have daylight savings time. The map of who uses what system across the world is surprisingly complex. Just see Canada. But the whole thing would really be a few kilobytes of data. I just want the correct data. I can find companies selling that, for $399 per year, but that's kind of silly. .... ah, a little more research shows that the Olson TZ database built into all Unix and Linux systems is a fine solution. It isn't authoritative, but it seems to be good, and gets updated once in a while, and it is already there. I kind of knew that, I had just forgotten. I'll use that. But, really, there should be one authoritative webservice somewhere I could just call. Manned by one employee in the UN or something, who'll call somebody in each region a couple of times per year and hear if they've changed their system, and who updates the database accordingly.

There are a lot of things one could do if more data were easily available as web services in authoriative normalized versions. Population, environment, geographical, financial data. If it were all available in standard ways, I could make my own analysis of what seems to be going on in the world. As it is right now, one has to put up with third hand questionable data, and it takes quite some financing to get somebody to normalize the data so one can do things with it.
[ Programming | 2004-08-16 15:32 | 3 comments | PermaLink ] More >

Wednesday, July 21, 2004

Help Wanted

So, over the years I've written all these software modules for an assortment of online purposes. Like:

Weblogs
Bulletin Boards
Chat Rooms
Calendars
File Sharing
Membership Sites
Online Directories
Work Groups
Personal Information Management
Shopping Carts
Online Website Generation
Forms and Database Generation
Mailing List Management
DNS Administration
Server Monitoring
Content Management
News Feed Aggregation
Image Manipulation
... and Wikis

And probably some I'm forgetting right now. All of it is in use in one place or another. And some of it isn't half bad. For that matter, some of it was a bit ahead of its time. And the users of some of these things seem exeedingly happy with what they do.

But there's a considerable problem with spreading oneself that thin as a programmer. Most succesful programmers will do one or two great things, or they'll have a team to work with.

Anyway, since I don't really, the result is that all of my modules are somewhat unfinished. Or, rather, they work well in the particular setting they were made for. As long as I manage the server, and fix things that go wrong, and tweak them for new purposes. But it doesn't mean they're easy to export.

I've been paid well for making some of these things, and some of them I made because I needed them myself, or to make nice places to hang out online. But generally I've never figured out how to make the jump to making a business out of any of them. And neither have I made the jump to package them as open source packages that people can just take and use, and others can contribute to.

And, well, Internet time moves quickly. So, while I can still enjoy that my weblog program does some things better than any other weblog program I've tried, other pieces are at the risk of slipping into obscurity, by being somewhat outdated and mediocre in how they look and what they can do. And across the board I've missed a lot of opportunities for doing something with these things at a higher level.

I can't count the number of times I've shown a selection of these programs to some business-wise person, who's told me that I could take any one of these and turn it into a thriving business. Usually accompanied by stories of people who've made it big with some fairly mediocre piece of software or other product, that they just managed to position well, and work hard on it, until it became a viable enterprise.

But which one should I pick? I'd be leaning towards most all of them. That is, a membership site where the users can easily set up an assortment of different resources, by picking from a menu and doing a bit of configuation. OK, so you want a website, and it should have a weblog and newsfeeds and a shopping cart and an event calendar, and you want an intranet for your employees with spaces for different teams and wikis, etc. Shouldn't be any great reason you shouldn't be able to have that up and running in a day or so, without needing to download any software or having to know any HTML.

I call that OrgSpace. That's a registered trademark. There's a corporation ready in England with that name. I've talked a good deal with Julie about launching something of that nature, starting back when we had a company in L.A. called Synchronicity. I've discussed pieces of it to great length with quite a few people.

But it doesn't work if I'm the sole programmer. And I'm sofar not as much of an entrepreneur as I'd like to be. So, it is stranding a bit both on the level of finishing the software, and on the level of doing the normal stuff one does to start and run and grow a business.

It could take all sorts of formats and directions. Like, a particular software piece might be a separate product in itself. Doesn't have to be an all or nothing proposition. It depends on what other people are inspired to be part of.

But I need programmers to collaborate with. We're talking about PHP. People who aren't as inclined to start from scratch as I am, but who'd feel inspired to do great things with pieces that already are 70% there, and to work as part of a team. This is in no way beginner's stuff, so some hardcore coding ability is needed.

A graphical design and layout person would be very helpful too. Most of my sites look like they were made in 1995, mostly because they actually were.

Some business help would be a good thing. I'm not ignorant of the basics, so it is maybe more a matter of coaching. Well, of course if one of you just wanted to finance the whole thing, that would certainly make everything easier.

But, barring that, we're talking about people who're interested in freely collaborating for the purpose of future business, or for making useful open source software, and useful online services. Or in making online communities and networks that work better. Whatever inspires you, and whatever format that is structured in. I can easily think of a variety of avenues for business or rewarding non-profit activities. I just need to get beyond wearing the programmer hat all the time. And I'm not going to give away just all the secrets here.

My own problem is that I'm a perfectionist, so I'm not the right person to do everything myself. It doesn't mean I'm necessarily hard to work with, but it means that I'm usually not sufficiently happy with what I do to get it out the door. You know there's the wise rule of 80/20 that says that you go for making 80% of what needs doing, and you get it out the door. And in the next iteration you do 80% of what is left. The hard lesson for a perfectionist to learn is that other people than yourself usually are quite happy with the 80% solution, as long as you actually put it into their hands.

A few little anecdotes:

I gave my shopping cart code to somebody once, when I considered it just half-way done, even though it basically worked. Somebody who was a much more novice programmer than myself. He worked hard for a month and set up a flashy online shopping mall site, where quickly hundreds of customers had paid for having their own online stores.

I wasn't very satisfied with my online website design tool, even though it actually did much more than anything else available at the time. Unbeknownst to me at first, a big Beverly Hills newspaper used the beta test demo version to put their whole paper online every week, with all previous issues archived. I would have said it was impossible, as it wasn't really a content management system suited for that purpose, but they found ways of working it so that it did what they needed, and were quite happy with it.

I made this opt-in mailing list management system. It handled mailing lists with several million subscribers and daily mailings. One of the companies using it wanted a faster mailing engine. I knew very well what to do, but I needed a C programmer to do a fairly minor piece for me, but instead I insisted on trying to do it myself, and procrastinated it. So instead they spent 1/2 million dollars or so on somebody else's system, which was inferior on various other counts, but it mailed really quickly, and it was supported around the clock by a team of people, where I was just myself.

You catch the drift, I'm sure.

So, if you're the right kind of person, and any of this is of the slightest interest, let me know.
[ Programming | 2004-07-21 06:45 | 11 comments | PermaLink ] More >

Tuesday, July 20, 2004

Ziggy Wiki

So, I wrote a wiki program. Now, that is maybe stupid, as there are plenty of excellent wiki programs around. But the ones I first looked at didn't have the combination of features I was looking for. OK, looking closer, there really are some impressive choices out there. But, again, not necessarily how I'd want to do it.

And then I have the troublesome tendency to feel really compelled to make my own programs from scratch. Oh, not everything, but the moment I stare at somebody else's code that I need to make changes to, to make it do what I want, and it is likely to take more than an hour or two, it is inescapable. I instantly imagine that it is easier to just do my own, rather than spell myself through the odd ways somebody else has done it. Oh, in reality, it is never faster or easier. But I do usually end up with something I'm comfortable with, and that I can easily add new features to.

Anyway, it works now, even though it only does the rather basic stuff that most wikis do. Being able to edit a page, keeping track of revisions, searches, recently modified list, user login, various kinds of markup, etc. There's a little page about it here. It is a PHP program, using MySQL database, attempting to be standard and use XHTML and CSS.

I'm trying to force myself, for the first time, to go through the necessary steps to successfully give my software away. Oh, lots of people are using my software for free, that's not what I mean. I mean, the open source way, where you can actually go to a site and download the latest version, and install it without too much trouble, if you're a bit technically minded, and you have a linux server account. Now, that is not altogether trivial. There's quite a jump from being able to run something on my own server, to expecting that other people can go and do the same without running into a wall. I usually make it difficult to do that by adding a lot of features to my programs, and linking them all together, so that it is difficult to extract one of them and make it survive in the wild. So with this program I'll try to restrain myself a bit.

Now, if it is its own program, it needs a name, of course. Wiki programs have silly names that usually rhyme more or less badly with "wiki". So, how about Ziggy Wiki? Doesn't seem to be taken.

Some of the features I have in mind are things like:
- Being able to embed a number of different kinds of objects in a page, like RSS feeds, web service calls, calendars, etc.
- Using a wiki to design and generate a "real" website.
- Importing content from various non-wiki sources
- Organizational features, categories, etc.
- Image handling and storage

And I guess I'll integrate it as a parallel dimension to my weblog program somehow. Probably starting by making my own public wiki and seeing how it works out.

To test my wiki, try the SandBox where it shouldn't do any damage.
[ Programming | 2004-07-20 15:01 | 0 comments | PermaLink ]

Tuesday, March 23, 2004

Why software stinks

Brief article in Salon.com (requires looking at an annoying commercial to read it) about some software luminaries' ideas about what programming is about, and how it can be done better. Hardware seems to keep following Moore's Law: it gets twice as fast, or half as expensive, every 18 months or so. But software does at best improve linearly. So most of the increased potential in hardware is wasted by messy and inefficient programming. The software is the bottleneck.

Programming is still mostly an art, not a science. I guess that's part of what I like about it. You pretty much have free hands to implement a requirement any way you feel like, as long as it ends up working. But it also means it is extremely hard to manage the complexity and to make good predictions about how long a certain task will take. And most solutions are extremely wasteful and often sloppy.

[Charles] Simonyi believes the answer is to unshackle the design of software from the details of implementation in code. "There are two meanings to software design," he explained on Tuesday. "One is, designing the artifact we're trying to implement. The other is the sheer software engineering to make that artifact come into being. I believe these are two separate roles -- the subject matter expert and the software engineer."

Giving the former group tools to shape software will transform the landscape, according to Simonyi. Otherwise, you're stuck in the unsatisfactory present, where the people who know the most about what the software is supposed to accomplish can't directly shape the software itself: All they can do is "make a humble request to the programmer." Simonyi left Microsoft in 2002 to start a new company, Intentional Software, aimed at turning this vision into something concrete.

Well, it can certainly make sense to separate the two - defining the solution and actually implementing it. Although not very well when we're talking systems analysts who decide what to do and programmers who do it. But maybe if the second part could be more or less automatic and directly driven by the representation of the design.

But everybody doesn't agree that it just takes a better way of writing the software. Rather a fundamental re-thinking of many things. But even in the energetic open-source scene, where people really are free to do whatever they'd actually want, we aren't particularly seeing fundamental changes in how things are done.

"There's this wonderful outpouring of creativity in the open-source world," [Jaron] Lanier said. "So what do they make -- another version of Unix?"

Jef Raskin jumped in. "And what do they put on top of it? Another Windows!"

"What are they thinking?" Lanier continued. "Why is the idealism just about how the code is shared -- what about idealism about the code itself?"

Yeah, how about a more fundamental revolution? Why am I stuck with a desktop metaphor on my screen, when my own desktop already had too few dimensions to it. We could do so much more with computers. But it is not easy to invent something totally different. I haven't, even though I'd love to. Those smart folks in the article have invented amazing new things in their time, but even they know it doesn't really add up to much compared to what is possible. We're limited by a tendency to define the next thing based on what we already know. That can even be a rather viable business strategy, as this funny quote from Bill Gates years ago illustrates:"The best way to prepare is to write programs, and to study great programs that other people have written. In my case, I went to the garbage cans at the Computer Science Center and I fished out listings of their operating systems."Yeah, and it still shows, Bill. Imagine he instead had invented something fundamentally new.
[ Programming | 2004-03-23 06:08 | 6 comments | PermaLink ] More >

Monday, February 9, 2004

Clipper

I don't know why I suddenly thought of it. But I suddenly was wondering what ever happened to Clipper, which was a database programming compiler I used to do a lot of work in. And when I wondered whatever happened to it, and searched on the net, I found this nice history. Wow, I recognize just about all of those boxes.

I learned "real" programming languages earlier. Fortran, Algol, C, Pascal. But when it came to doing stuff that a normal business needed, they didn't do me much good. dBase did. I was running dBase II on my first IBM PC in 1983. In 1985 I was asked by an insurance company (or, rather a PPO, a network of doctors that were processing insurance claims from them) to do a simple application for keeping track of their insurance claims, which they were otherwise doing by hand. They asked me, and that was a quite meaningful question at the time, whether I thought it would be best to do it with macros in Lotus 1-2-3 or in dBase III. I was very fluent on both, and I luckily said that dBase made the most sense. Then I heard about Clipper, which was a faster, compiled, version of dBase. xBase would be the generic term for some kind of dBase clone. So, we switched to that. And what was meant to be a little parttime project for me to tinker with for a couple of months mushroomed into a five year project as the company grew dramatically, and became more efficient based on my, initially, clumsy Clipper program.

This was running in DOS on 286es. And networking was a bit primitive in those days. First it was a Corvus network, then 3Com. The Clipper version was at first Winter 85. And now, when it came to that multiple users needed to share one database, I was having a bit of a problem. File or record locking weren't available yet, neither in the networking software nor in Clipper. A year or so later it was, but at first I had to jump through some major hoops to implement the functionality of a multi-user, file-locking, error-checking, commit and rollback database system with some tools that didn't actually support it at all. But it worked. At the time the company would once in a while have Big8 consulting companies come in and evaluate what they were supposed to do. And on more than one occasion they recommended rather forcefully that they get a mainframe or at least a minicomputer, and drop this silly PC stuff which wasn't meant for this volume and kind of activity. Somehow, since I didn't know it was impossible, I convinced the management to trust that I could do it anyway, and they saved a few hundred grand in hardware costs when it turned out that I succeeded. Even Nantucket, the people who made Clipper, were shaking their heads when I explained my system of 50 workstations processing a million claims per year, stored on a database distributed over a half dozen servers. They had never heard of anybody doing anything that big with their software.

Anyway, today it would be nothing special. But then again, PCs today are close to a thousand times faster than then, and have more than a thousand times as much storage space. And dBase or Clipper would be words I wouldn't even put on my resume, as it is antiquated stuff nobody's using today. Well, almost nobody. But it is nice to be a little nostalgic.
[ Programming | 2004-02-09 08:08 | 7 comments | PermaLink ] More >

Friday, February 6, 2004

RSS incompatibilities

Mark Pilgrim has an enlightening article about the myth of RSS compatibility. I suddenly feel better about the little difficulties my self-made aggregator is having. Meaning, it is generally a mess to try to match a whole bunch of incompatible standards at the same time. Not to mention that people who write the feeds don't necessarily adhere to the standards. I was otherwise a bit dissatisfied that I couldn't find any suitable package that bothered to discern all the details of the different standards precisely, and I didn't get around to figuring it all out myself. Most packages seem to concentrate on what most of them have in common and ignore the differences and the details. Mark Pilgrim seems to have make an RSS parser that both juggles all the differences and is permissive about accepting and fixing the ways feeds might be screwed up. Looks great. I need to try it out.
[ Programming | 2004-02-06 18:28 | 0 comments | PermaLink ]

Wednesday, January 7, 2004

The algebra of feeds

An older post from Seb Paquet "The algebra of feeds, or the amateurization of RSS bricolage" is very pertinent to the subject of Pipelines for XML.

Recent talk about RSS feed splicing and the ineluctable need for filtering open feeds got me thinking about the variety of operations one might want to perform on feeds.
Taking a cue from the operations of set theory we could for instance define the following:

Splicing (union): I want feed C to be the result of merging feeds A and B.
Intersecting: Given primary feeds A and B, I want feed C to consist of all items that appear in both primary feeds.
Subtracting (difference): I want to remove from feed A all of the items that also appear in feed B. Put the result in feed C.
Splitting (subset selection): I want to split feed D into feeds D1 and D2, according to some binary selection criterion on items.
The ultimate RSS bricolage tool would give users an interface to derive feeds from other feeds using the above operations, and spit out a working URL for the resulting feed.

I'm not sure how all of it would work, or even if all of it can work in practice. I'm completely abstracting out technical considerations here. While I'm not sure how large the space of useful applications of this could be, here are a couple example uses:

Splicing: All of the posts on the Many-to-many blog have to do with social software, so it would make sense to send its posts over to the social software channel. Now, since the blogging tool we use for that blog doesn't support TrackBack, it can't automatically ping the Topic Exchange. A workaround would be to merge both channels into a new one. In general, this would enable any combination of category feeds from various sources to be constructed very simply. A feed splicer can also serve as a poor man's aggregator.
Intersecting: Say I want to subscribe to all of Mark's posts that make the Blogdex Top 40; I'd just have to intersect the feeds. Or I could filter a Waypath keyword search feed in the same manner.
Subtracting: I'm interested in some topic that has an open channel, but find the items by one particular author uninteresting. (This is equivalent to the killfile idea from good ol' USENET.) Subtraction could also be used if you don't want to see your own contributions to a feed.
Splitting: One might want to manually split a feed into "good" and "bad" subfeeds according to a subjective assessment of quality or relevance, or automatically split according to language, author, etc. Note that this one doesn't qualify as an example of pure feed algebra, as it involves inputs beyond feeds.

Yes, cool, I'd certainly like to do all of those things. I'd like to take newsfeeds or similar data and combine it or filter it or sort it, and I'd like to also be able to do it based on metadata from different sources. So the trouble is now that those feeds are in a number of different formats, and even if some of the metadata, like what appears in some top-40 list, or other ways of rating or categorizing posts, might be available in some XML format, it is not likely to be consistent in any instantly useful way. In a few hours I'd be able to do something with it. But I'd like to be able to do it in a few minutes or few seconds. Hm, this is certainly worth playing a bit with.
[ Programming | 2004-01-07 17:21 | 4 comments | PermaLink ] More >

Sunday, January 4, 2004

XML pipelines

Jon Udell talks, also in an article in InfoWorld about how the Unix way of doing things might at some point merge with more user-friendly the way normal users expect to do things. Through XML.

"I've always blended the geeky, command-line-driven Unix style with the mom-friendly point-and-click Windows approach. To borrow a Microsoft slogan, the two approaches are "better together." Each has strengths that complement weaknesses in the other. However, we've yet to achieve real synergy."

Now, Jon Udell happens to be particularly good at making things fit together. He wrote an excellent book, Practical Internet Groupware which outlines the approach of using stuff that already exists and works well, and which can be linked together modularly. So, in terms of making groupware, instead of suggesting starting from scratch and building a huge monolithic piece of software, he suggests connecting together rather ancient, but well-functioning, protocols like SMTP (mail), NNTP (newsgroups) and IRC (chat), and doing rather low-tech things to make an integrated system with them.

So, for those who haven't ever used it, let me explain briefly what the original Unix philosophy was. Lots of small programs would do small specialized tasks very well. To do more complicated things, one would connect them with each other. Like, you'd have a program called 'sort' which does nothing but sorting text files. And it doesn't have any fancy interface or anything. You just connect some text to its input pipe and the result comes out its output pipe. Which you can connect up to something else. So any Unix guru worth his salt can make a one-liner that takes the contents of some file, finds all lines that include some particular word, split off a couple of different columns from those lines, sort them all alphabetically, zip up the results in a compressed file, and mail it to somebody. That's one line, and not a very long one.

That's quite a splendid way of doing things. But almost a lost art, and not for any terribly good reason. I have it running under the hood in my OSX of course, so it is there. But why don't we expand the same idea to more areas? Why don't I have a bunch of modular lego bricks to do all the things I could think of doing with the net and with my information? Are there new ways we should accomplish something equivalent today?

"It's clear that that the future of the Unix-style pipeline lies with Web services. When the XML messages flowing through that pipeline are also XML documents that users interact with directly, we'll really start to cook with gas. But a GUI doesn't just present documents, it also enables us to interact with them. From Mozilla's XUL (XML User Interface Language) to Macromedia's Flex to Microsoft's XAML, we're trending toward XML dialects that define those interactions.

Where this might lead is not so clear, but the recently published WSRP (Web Services for Remote Portals) specification may provide a clue. WSRP, as do the Java portal systems it abstracts, delivers markup fragments that are nominally HTML, but could potentially be XUL, Flex, or XAML. It's scary to think about combinations of these, so I'm praying for convergence. But I like the trend. XML messages in the pipeline, XML documents carrying data to users, XML definitions of application behavior. If we're going to blend the two cultures, this is the right set of ingredients. "

Now, I don't understand WSRP, XAML or XUL. And I have sort of a problem with most things done in XML. That it usually ends up being very complicated, and it takes days of study to do anything. XML is simply a uniform way of structuring data. That's a good idea of course. But doesn't magically make all XML talk with each other. Maybe because there are some other tools with strange acronyms that I don't yet know. Maybe because I'm not smart enough to understand the whole point. It seems like it all should be as simple as the Unix pipes. Connect the output of one service to the input of another, and string a few together and you can do anything. But, despite that I'm a techie and have read books about XML and use some XML based protocols in some of my own programs, there's nothing I can think of that I can do quickly that comes remotely close to the simplicity of the Unix command line. Seems like most of the interfaces that use XML do their own different thing, and you have to study for a while to figure out what is available before you can access it. Anyway, maybe I'm just revaling my own ignorance. But I hope that he's right, and that some kind of convergence will happen. I want to use all programs I have access to as modular building blocks, and of course data should be able to pass from one to the other without having to write big complicated conversion programs.
[ Programming | 2004-01-04 11:06 | 2 comments | PermaLink ] More >

Sunday, December 28, 2003

News Aggregator

I spent the evening making my own news aggregator, and remarkably I succeeded to my satisfaction.

For any non-techies, a news aggregator is a program that sucks up data in RSS format, served by weblogs of various kinds, and presents it all in a uniform way.

I've tried a bunch of different aggregators. What I liked best was Radio Userland, because it shows the feeds together, looking like a weblog, and it seemed to be able to mainly show me a flowing stream of new stuff. Which I liked, but I'd kind of like more options. But when my paid license expired, I hesitated to renew it, because I wasn't really using it for its weblog or other functions. So I tried a variety of other programs.

FeedReader on Windows, which was nice. Except for that I don't like having to watch postings one at a time. I like the big overview. On Mac I then used Shrook for quite a while. It still had that 3-pane thing, and crashed every couple of hours, so it ended up not running most of the time. I tried installing NewsMonster, after its website made me feel kind of stupid, as it is so superior that it apparently can do everything, including a bunch of things I don't know what are. Except for that it couldn't find Java on my computer, and messed up some of the menus in Mozilla that it was supposed to integrate with. I installed Pears which runs in Python. Worked, but was a bit too simple. I installed AmphetaDesk, which required installing a whole bunch of Perl libraries first. And, now, I like the look of it. Quite a bit like Radio. But now there's again a bunch of things I'd want it to do that it doesn't do.

So, I woke up late and thought that if I could make my own aggregator, and I could finish it so it was functional today, I'd go for it. I really have other things to do, but it is Sunday and christmas, so nobody would be missing me too much.

Somewhat reluctantly I decided to look for a library that does the basic fetching of an RSS feed. My first thought was that I could just as well write that myself too, but that is the kind of arrogance that makes me often end up with projects full of features, but not quite finished, because I try to do it all myself. So I picked up the Magpie RSS library in PHP. Which seems simple enough, and I only needed some of its features.

Now, I decided to set it all up on my server, as opposed on my local machine, so I can make the functionality available for other users of my weblog program, and so the feeds can be cached amongst those users. And what I wanted was to store feeds and postings in mysql, so they can be kept indefinitely, and to be able to keep track of which ones have been read and stuff like that. So that is what I set up. A cron job picks up all channels every hour, and figures out what are new or updated postings. And then some PHP pages show which feeds one is subscribed to, which are available from the pool that is already on the server, and allows addition of new feeds. And one can see them either one at a time, or mixed together. And I borrowed somewhat the look from AmphetaDesk. But then I added the ability to keep track of which items in each feed a given user has read, and which ones they've at all seen. Then it can avoid showing what has been marked as read, and it can mark new postings with a little NEW icon. And I made it so the postings can be grouped by feed or by date. And they can be sorted in various ways. And I made a way of saving interesting postings to a separate place before they scroll away. And I added in the 50 or so feeds that I normally watch. And this already works better for me than any of the other aggregators I've used.

I'll tinker some more with it before I'll let anybody else use it. And there are a few more things I'd like to add. It should be able, of course, to pass a post on to my weblog program, if I want to quote it. I need some ways of searching through older postings. Some more options of viewing them. Like, headings only, short excerpts, with or without pictures, etc. Maybe a way of categorizing the saved postings. But this should do for today.
[ Programming | 2003-12-28 19:55 | 5 comments | PermaLink ] More >

Saturday, November 29, 2003

The Annotated Web

Roger is doing some good thinking about how to make an annotated web:

"The voice of humanity network will be implemented first as the "annotated web", that is, as a community created stigmergic overlay of the web. The AntWeb is entered through portal pages, one portal page for each participating web community. Having entered through the portal, every link traversed by the membership is logged as the members browse the web. Facilities will be available to rate web pages and to add notes and keywords as members browse along. The links followed and the ratings given will be used to mark links as popular or unpopular for the others who might visit the same site – provided they have entered through the same portal, of course. The notes will likewise be accessible to those who come by later, and the keywords can be used in a lookup facility on the portal page to jump straight to items that are "funny" or "curious" or "delightful" etc. It should be fun. (See the MetaWeb Article for a previous take on the Annotated Web concept.) [...]

From the portal, every link we take will be tracked and participants will be encouraged to rate each page visited and optionally to add notes and keywords. This tracking process will not cover only the links on the portal, but will continue no matter how deep into the web we get. When we visit pages, if we have "NOTES ON" then we will see the annotations as interpolated wiki-like links, or if "NOTES OFF", then the annotations will be available through tiny buttons. Either way, the annotations will be presented as separate web pages, available for sub-annotations and rating. Annotations, therefore are wiki like. However, it is not possible to actually edit the original page, only to add annotations."

"AntWeb" = "Annotated Web" - that's very cool. And it all seems like something almost familiar, that I'd of course want. And it is almost strange that it doesn't already exist. A couple of years ago there were some sites/programs that allowed you to browse around the web and leave independent notes about the sites, and to see who else was currently hanging out on the same site, using the same program. I can't remember any names, but whatever happened to those programs?
[ Programming | 2003-11-29 07:35 | 7 comments | PermaLink ] More >

Monday, November 24, 2003

Open Source

I'm having a little bit of a conflict with myself. See, I believe that the philosophy and practice behind open source software is one of the most powerful and hopeful potentials in the world. Individuals, small teams, large networks doing good things and sharing the results. For a variety of reasons like: it needed to be done; it is fun; other people think you're cool; knowing it is the best way to do things; or just to scratch one's personal itch. The point is that money and greed has relatively little to do with it. But monetary rewards might very well follow from this approach. It all makes sense that the best way of being valuable, and being considered valuable in the world, is to get something useful into as many hands as possible, and making as few barriers as possible to further creativity and improvement.

And, now, one of the roles I've most often played is that of a programmer. I've written a lot of code, a lot of software, some of which has been very useful to others. I've written chat rooms, bulletin boards, calendars, task managers, weblogs, member databases, mailing list managers, website authoring programs, shopping carts, content managers, image manipulation, DNS administration, server monitoring, and probably much more I'm forgetting.

But I've never made a program open source. I.e. I've never created a page that you could download a program from, with installation instructions and documentation. And I've never made any meaningful way for others to contribute to the programs I created. Why not? Well, in part it might be that I still have the remainder of a belief that I somehow would be more likely to be paid if I kept the software close to my chest. Despite that a lot of this was given away freely to use on my server. But, even more, it is probably that I'm a perfectionist and my projects are usually a little too ambitious. Meaning, they were never quite finished to my own satisfaction, so I didn't feel they were ready for prime time. And I had usually made some shortcuts that meant that the programs worked alright when they stayed on my server, and when I could fix any problems that popped up. It takes some additional effort to make software solid and generic enough that somebody can just download it and use it in a somewhat different environment, maybe in ways I hadn't foreseen.

Most successful open source projects start off by doing one relatively limited thing fairly well. They might grow from there, sometimes tremendously, but they usually start by providing a small amount of well-defined functions. I know that very well. But still, I usually end up trying to include everything and the kitchen sink in my plan, so that even if I do something relatively limited, it has hooks into a bigger master plan, which I usually never quite finish. And therefore the individual pieces might not be easy to give away.

I'm considering changing my mind, and picking one of my projects as something I can make limited and solid enough that I can actually export it to other people. Best candidate right now seems to be a program I wrote to easily create databases, which automatically come with online forms submission, admin area, searches, group e-mailing etc. I wrote it first when I noticed that many of the little website database jobs I got were boringly similar. There are some forms on a website people can submit stuff at, and then we keep it in a database, which we need to manage in an admin area, and we might want to send e-mails to people who signed up, etc. That can easily be a few days of work each time. Where really it could be done in 1/2 hour if you didn't have to re-do the same repetitive work.
[ Programming | 2003-11-24 07:57 | 14 comments | PermaLink ] More >

Monday, November 10, 2003

Cognitive Agent Architecture

COUGAAR is an "Open Source Cognitive Agent Architecture for Large-Scale Distributed Multi-Agent Systems". Wow, that sounds neat. From that site:

"Cougaar is a Java-based architecture for the construction of large-scale distributed agent-based applications. It is a product of two consecutive, multi-year DARPA research programs into large-scale agent systems spanning eight years of effort. The first program conclusively demonstrated the feasibility of using advanced agent-based technology to conduct rapid, large scale, distributed logistics planning and replanning. The second program is developing information technologies to enhance the survivability of these distributed agent-based systems operating in extremely chaotic environments. The resultant architecture, Cougaar, provides developers with a framework to implement large-scale distributed agent applications with minimal consideration for the underlying architecture and infrastructure. The Cougaar architecture uses the latest in agent-oriented component-based design and has a long list of powerful features."

Sounds good. I'm very interested in self-organizing agents, and this is open source. Can I use it for anything? There are some documents. The architectural overview seems to be in clear enough language. I can't quite grasp if this only makes sense for a large logistical planning operation. Like if I'm going to invade Syria and I want the right number of M1 tanks to show up in the right place at the right time, despite that they, and various other pieces of hardware they depend on, come from a variety of different places, and I suddenly change my mind while the tanks are on their way, and btw some of them got blown up on the way, and so did some of the computers keeping track of this, but I want it all to work anyway. Is a similar framework meaningful for a more peaceful grassroots purpose? Such as the distributed storage of good information. Or collaborative planning of events or projects amongst diverse collections of organizations.
[ Programming | 2003-11-10 17:37 | 0 comments | PermaLink ]

Monday, October 20, 2003

Weblog Interfaces

Weblog APIs (Application Program Interfaces) can allow you to operate a weblog program through a different program. For example, you can post articles from another program, with an interface you like better, without having to change your primary weblog software. Or you might post entries directly from a weblog aggregator. Or programmers can think of new ways of tying things together that previously weren't.

Probably the first example was the LiveJournal Client/Server API. And then there was the Blogger API, and more recently the MetaWeblog API. And now I was just reading about the new Atom API.

I haven't gotten around to using any of them for my NewsLog program that this weblog is running in. When last I was looking into it, when the Blogger API was the main thing around, I just couldn't figure out how to squeeze my functionality into its too limited paradigm. Next time I feel like giving it a shot, I think MetaWeblog and Atom looks like what I ought to concentrate on.
[ Programming | 2003-10-20 05:29 | 0 comments | PermaLink ]

World Kit

Mikel Maron is doing very cool things with world maps. First he made World as a Blog which is a Flash animation that shows web log postings in real time, popping up on the world map, at the location of the poster. And now he's taken it further and put together World Kit, which makes it possible to do the same thing with just about any data that has geographical coordinates, by interfacing with the Flash app though XML. I can't to play with that. If I just had more hours in the day.
[ Programming | 2003-10-20 05:11 | 1 comment | PermaLink ] More >

Thursday, July 10, 2003

Xpertweb - distributed data

The approach in xpertweb is to make somebody's data so easily accessible in a standardized format that others will be likely to pick up and keep a copy of it. There are several incentives to do that, actually, and several positive outcomes of that.

Each person will have mentors who receive a small cut of the business, in exchange in part for helping that person get going, and also for acting as backup and validation nodes. So, if you yourself wipe out your disk and lose your data, you can be pretty sure that your mentor will have a copy of it, and you'll be back in action shortly.

Multiple copies in accessible standardized form will also allow others to verify the completeness and integrity of your data. For one thing, it makes it much harder to cheat. If somebody gave you negative feedback, and your mentor, amongst others, picked up a copy of that, you can't just cheat and go and delete it, even though the primary storage place for your activity is your own site. The people you carry out transactions with will also keep a copy, which again will stop you from messing with what really happened.

So, it is not only like double entry book keeping, where each action is debited in one place and credited in another. It is also like the entries are continuously being photocopied and mailed off-site.

This is all possible because the amount of data per person is relatively small. We're talking about transactions as a record of two or more people who come together to make an agreement for something to done, where that something then gets done, and where the parties then record their satisfaction with what happens. And if we were talking about the sale of a service, money would then change hands. But what we're recording is simply the steps of that cycle. Look for a person/product/service, negotiate an agreement on what should happen, make it happen, record comments and ratings about the whole thing. That's not much data. You could quite easily store the activity for hundreds of people.

And the point is exactly that an easily accessible, distributed, tamperproof information network, recording in simple terms what business transactions people agree to engage in, and how happy they are with the results - can possibly add up to a reliable picture of reputation.

If it were controlled from any one place, it probably wouldn't be reliable information.
[ Programming | 2003-07-10 03:09 | 3 comments | PermaLink ] More >

Thursday, June 19, 2003

Ingenious email-harvester honeypot

From BoingBoing:

Merlin Mann outlines an ingenious procedure for identifying spammers' email-harvesters' IP addresses and user-agents:

"In each page I serve, I include a bogus email address, encoded with the date of access as well as the host IP address and embedded in a comment. [Apache's server-side includes are great!] This has allowed me to trace spam back to specific hosts and/or robots.

One of the first I caught with this technique was the robot with the user agent "Mozilla/4.0 efp@gmx.net", which always seems to come from argon.oxeo.com - it's identified it above as simply rude."

Simple and clever. Well, relatively simple for a programmer. Now, if we could coordinate the gathering of a lot of that kind of data. I.e. mapping spam to who mined the address in the first place.
[ Programming | 2003-06-19 23:59 | 1 comment | PermaLink ] More >

Friday, June 6, 2003

DIY DigID

Britt Blaser describes DIY Digital ID. Essentially he describes what we've already done a simple demo of. Despite that I'm the programmer on it I guess I'm still not certain it will be sufficiently useful. Maybe it will. The point is that it is a hard problem to solve to create centralized IDs for everybody, to make sure we know who we're dealing with when we're doing transactions with each other. Particularly financial transactions. There's an issue of who we would trust to issue such IDs, and whether they will really prove anything, and how we all can agree on how they are used. No common standard for such IDs has emerged. Britt's Do-it-Yourself idea is essentially that we reduce the problem to two people with websites going through certain simple steps to ensure they really are talking with each other. I express an interest in some service on your site, indicating an ID file on my site to explain who I am. Your software sends me back to my own site, asking me to demonstrate my control of the site that goes with the ID file by logging into the private area and finishing a log entry. Your entry then verifies that this log entry was made in the same location as my ID file, and that the time and IP numbers match what was observed. With this handshake being done we can then continue exploring the possibility of doing business. There'd be other factors involved, and other components needed, but that simple action could very well be a foundation for something useful. This kind of sequence of actions, your site to my site to your site, is fundamental to Xpertweb. Peer to peer. Standardized protocol for how we negotiate each step. Everything stored in simple XML files that are public and that can be discovered by any other party. And that allows me to research further who you are by spidering around and checking with other people who seem to know you, other people you've had business interactions with in the past. All without consulting any central authority.
[ Programming | 2003-06-06 20:28 | 4 comments | PermaLink ] More >

Monday, May 26, 2003

Spam filtering

Spam is an overwhelming problem for my e-mail inbox. I get maybe 20 spam messages for each message I really should get. I'm considering switching to using the Mail application on Mac, which has excellent features for training it to recognize spam and not spam, but it seems awfully slow, and I'm not sure if it otherwise is suited for my large amount of mail. So I'm considering whether I should have something running on my server that filters the mail before it gets to me, and which I can train. I find the traditional tools, which use centralized sources of spam blacklists, totally useless. In part because I have some experience with the ways they are created. Often the over-zealous maintainers will block large IP ranges for questionable reasons, and much legitimate e-mail is lost. So, I'm looking for something I personally can train.

BBC News has an overview article about "Bayesian filters", which appears to be the most effective approach right now, able to recognize up to 99.9% of incoming spam, based on the filters you train it to have.

I'm particularly interested in crm114 which appears to be an open source package I could use on one of my servers. Not that I don't have anything else to do, but the spam problem is driving me crazy.
[ Programming | 2003-05-26 15:11 | 6 comments | PermaLink ] More >

Saturday, May 10, 2003

Weblog APIs

I still haven't gotten around to finish writing a program interface to this newslog program, but I've better pay attention to what is going on in that field. For the uninitiated a "weblog API" is the standardized way any program can interface with a weblog. With that in place, you might post to your weblog from an assortment of different programs, and it becomes easier to transfer postings or quoted excerpts from one place to the other. When I first looked at the Blogger API and the MetaWeblog API, I was sort of puzzled that they only seemed to do part of what I'd want, and were of no help with the more advanced features. Anyway, here is good overview of what is going on, Weblog APIs: Stating the obvious:

Something's buzzing online about weblog APIs. Someone posted a comparison between the Blogger and the MetaWeblog API on his weblog. Then Dave Winer is getting pretty riled up about Google's plans to develop a new version of Blogger API, which should better be based on the MetaWeblog API instead.

First things first, Diego Doval is correct when he says that a weblog API should provide access to all the functionality offered by the product (be that Blogger, Radio, or MovableType). Now, clearly the Blogger API and the MetaWeblog API are quite different, even though the latter is actually based on the former. Comparing the two is pointless, because that is not really the issue. Having to deal with a variety of weblog APIs is a curse on the intrinsically open nature of weblogs themselves. One of the most interesting aspects of weblogs is the ability to not only to share information relatively quickly, but to make it widely accessible (Joi's Ivan story is a great illustration). While developing Kung-Log, I stumbled across so many differences between each weblog system's implementation of whatever Weblog API they endorsed. It's even worse. Weblog systems may differ in how they implement, for example, the Blogger API. pMachine's version of the Blogger API is, to phrase it mildly, ridiculous and clumsy. Only one weblog system has provided a nearly all-functional, self-explanatory, and straightforward implementation of a weblog API (there's a hint of subjectivity here, but still, the point remains). MovableType not only implements both the Blogger and the MetaWeblog, it provides a variety of extra API routines, that make client access to MovableType weblogs incredibly easy. An illustrative example is the getTrackbackPings method, which can be used to retrieve the list of TrackBack pings posted to a particular entry...

Sounds like the MovableType version is what I'd want to pay most attention to.
[ Programming | 2003-05-10 15:06 | 2 comments | PermaLink ] More >

<< Newer stories Page: 1 2 3 Older stories >>