Ming the Mechanic - Category: Programming

Ming the Mechanic - Category:
Programming
The NewsLog of Flemming Funch

Saturday, November 1, 2014

Being a computer programmer has been one of my main sources of income for the past more than 30 years. More than half of it freelance, doing projects for people. I've noticed an important difference between very successful projects and not so successful projects.

Is there anybody to have the conversation with?

Software development is a type of knowledge work. Key parts of the work is to get to understand the problems at hand and inventing solutions for them. Because what needs to be done generally speaking isn't known in advance, it is being discovered along the way. Modern approaches to development, such as the Agile principles, take that into account by bringing all the stake holders together and engaging them in frequent conversation, and by progressing very incrementally.

But many people still mistakenly think that software is something mechanical. You just need to specify clearly what you want, and then the programmers need to just go and code it. That was how it was generally perceived a few decades ago. Analysts would write the specs for what needed to be done, and then we just needed to apply enough programmer man-hours to implement it. It never worked well, because once the programmers came back with the work, 6 months or a year later, and it was shown to the people who asked for it, they usually would realize that it wasn't really want they wanted, and they'll ask for changes. And the analysts would revise the specs and the programmers went off and did the work again, and came back some months later. That used to be the idea, but it is ridiculously wasteful and ineffective, so the approach has mostly been abandoned. However, that doesn't necessarily mean that it is easy to persuade clients that they need to be engaged in the process. Sometimes it is a very hard sell.

The most successful software projects I've done have involved a continuous conversation with some other people, often daily. That doesn't mean long meetings. Nowadays it means brief asynchronous instant messages, and occasional short face to face meetings. Doesn't have to be with everybody, but it should certainly include somebody who has a sense of what is needed, i.e. somebody who represents the interests of the end users. And, strangely, that's occasionally very hard to find, and it might make the project suffer greatly. Meaning, it will take a lot longer, be more costly, and most likely not deliver what really is needed.

The problem is in part that complexity is hard to understand. Complexity is something dynamic and alive that can't be all understood in advance, but that has emerging properties, which might or might not be the desired ones. This is as compared to stuff that is merely complicated. If it is complicated it might well be possible to make a big assembly diagram and have somebody simply follow the instructions. That works with many things, like Ikea furniture, but not with others, like software. Or communication. You can't just outsource an organization's social media relations and expect that it will work well. There needs to be somebody home who participates in a process of finding what works.

So, a note to myself: Face the issue up front. Don't accept a development project that the client isn't themselves taking part in. If they just want to tell you what they want and hear from you how long time it will take, walk the other way.
[ Programming | 2014-11-01 17:33 | 8 comments | PermaLink ] More >

Saturday, February 24, 2007

Writing books in HTML/CSS

Slashdot

"Opera CTO Håkon Wium Lie hit back today at Microsoft's push to fast track Office Open XML into an ISO standard, in a
blistering article on CNET. He also took a swipe at Open Document Format: 'I'm no fan of either specification. Both are basically memory dumps with angle brackets around them. If forced to choose one, I'd pick the 700-page specification (ODF) over the 6,000-page specification (OOXML). But I think there is a better way.' The better way being the existing universally understood standards of HTML and CSS. Putting this to the test, Håkon has published a book using HTML and CSS."

Just posting this to remember the thing about making books in HTML and CSS, which actually seems to be quite possible, and probably a better idea than those horrendous document formats.
[ Programming | 2007-02-24 14:20 | 8 comments | PermaLink ] More >

Monday, February 5, 2007

Software is hard

Software is hard, an article/interview on Salon, based on Salon co-founder Scott Rosenberg's book Dreaming in Code. About, well, why programming is hard, and why it particularly is so hard to estimate how long a given development task will take. I certainly recognize that problem. The book's sub-title is "Two Dozen Programmers, Three Years, 4,732 Bugs, and One Quest for Transcendent Software". That's talking about Chandler, which is the particular focus for the book. Chandler is an ambitious project of creating a new way of organizing personal information. It is conceived, financed and managed by Mitch Kapor, the guy who invented Lotus 1-2-3. He hired the smartest people he could find, and was even prepared that it would take a while. But it seems to take forever to even get a preview release of some kind out the door.

The seminal work about why software is surprisingly hard is of course The Mythical Man-Month by Fred Brooks, which was about the development of the IBM360 operating system, which also took forever. Part of the wisdom from that experience was that as you add more people to a software project, the complexities of the communication between them grow exponentially. It becomes much harder for anybody to know what is going on, and very hard to be on the same page, so the majority of the effort is wasted in trying. Microsoft is of course a good current example. Very smart people, but the projects get so complex that it takes thousands of people and billions of dollars, and the product ends up being several years late, and it is an incoherent mess. Brooks' solution was that programming should be done by small teams, 3-5 people, with very well-defined roles, where one person would be responsible for the overall conceptual integrity of the project.

Scott Rosenberg doesn't seem to have anything terribly revolutionary to add, but he does formulate what he somewhat jokingly calls "Rosenberg's Law". Essentially, it is the (somewhat obvious) wisdom that software is only hard and difficult to estimate when one is doing something new, something that hasn't been done before. Which is right. The people who can do very disciplined on-time software projects usually can do so because they do something that has already been done. You know, if a customer needs a website with 10 pages and 5 graphics and a menu, and that's what you do every day, it wouldn't be too surprising if you can provide an exact quote on that.

The traditional "ideal" way of carrying out the software development cycle would be a sequence of studies, of feasibility and requirements, and an analysis of exactly what needs to be done, resulting in some specs handed to the programmer, who "just" needs to do it. That has never really worked, but, in principle, if it already is perfectly clear exactly what needs to be done, of course it isn't hard. It just never is clear, because people usually don't know exactly what they want before they see some stuff they don't want, and they have a choice. So that way of developing software is going out of style. It has to be more interactive than that. Shorter cycles, involving both the programmers and the users in reviewing the progress, frequently. Which tends to be how one does it nowadays.

Part of the trouble with software is that programmers are only having fun if they're doing something new. So, even if there might be an existing solution, which would be boring, but reliable, most programmers would prefer to make their own solution. And there's no ultimate formal way of doing something new which is partially unknown.

What is missing is really tools for modeling things to do. Oh, there are diagrams one can do, but that isn't it. One would need to model the real thing, no matter what it is. Which, unfortunately, takes about the same amount of work as doing the project. So, the general problem might only be solved at around the same time when most programming will no longer be necessary. I.e. you interactively work out the model of what to do in real-time, and when you're done, the software is done too. No separation between the specification and the doing. Would be great. There are systems that do that to some degree, but so far nobody's succeeded in making it general enough.

The ultimate software project would be to invent a system that makes programming obsolete, by making it so simple that anybody can do it, very quickly. Unfortunately that's a hard.
[ Programming | 2007-02-05 15:21 | 6 comments | PermaLink ] More >

Sunday, November 19, 2006

Thingamy

Thingamy. This is what it says on the site:

One single system to run your business.

No need for other enterprise software nor middleware.

No need for hierarchies nor information tree structures.

No need for management to run the workflow.

Enter the future at your own pace, start small or big.

Refine your business model and processes continuously.

And yes, you're not the first to utter unbelievable, bollocks, bullshit, etc. under your breath.
We like that, leaves us only one task: Prove that the system actually works.

Would that not be kind of cool if we did?

And what is it? Well, it is a piece of software, which apparently one can model any kind of business process in, very quickly, and then you have the application to run your business right away. So, seems like you can create your Enterprise Resource Planning system in a few hours, and it would do ordering and accounting, etc. It is based on some kind of object-oriented, rules-based database thing, that is also a webserver, with an Ajax interface.

I watched the introductory video. Which indeed shows that you can do something like that very quickly, and one gets the idea that it is basically made of simple building blocks. But it is also very complicated to do, as it was in no way clear how exactly to do it, and the components weren't really explained.

But this is the kind of application I've tried to write several times. A universal application with simple building blocks that lets you create any kind of application, and it is operational right away. But it is very hard to make something that really is universal. And if you more or less succeed, it might be very hard to explain it to anybody. Which might become the problem with this Thingamy thing.

It doesn't seem to have been released yet, so I can guess it maybe is hard to get all the details right. And maybe it promises too much. It sort of implies you can create your business based on this. Focus on the value you offer, and Thingamy will easily take care of all the details. Of course it isn't quite that easy.

But I'm looking forward to seeing where it is going.
[ Programming | 2006-11-19 21:30 | 3 comments | PermaLink ] More >

Wednesday, December 14, 2005

Ruby on Rails

Ruby on Rails 1.0 was released yesterday.

Grrr, I have too little time.

I had heard about Ruby and Ruby on Rails for quite some time, but didn't get around to looking more closely before recently.

Ruby is an elegant object-oriented language created in Japan some years ago by Yukihiro Matsumoto. I had looked at it several times, but however good it sounded, there really has to be an exceptional reason for changing the language one programs in. The biggest value is usually in knowing one's tools really well, as opposed to just changing everything whenever another language or platform comes along with slightly better features.

As far as the web is concerned, I first made programs in Perl, because that was basically the obvious choice in 1995. I did shopping cart programs, chat programs, and various other things. But Perl is just too damned cryptic, and I never felt overly comfortable with it.

Then PHP started happening, and it just seemed so much more practical to be able to embed code directly into webpages, and it was more clean and straightforward. So, I switched without much hesitation. Since then I've done hundreds of thousands of lines of PHP code, and PHP has grown into the most widespread solution for coding for webpages.

I've looked at other things in-between. Like Python. More of a "real" language, in that it makes it easier to make clean and well-structured programs that are easy to maintain. But that in itself wasn't enough to switch. But then I looked at Zope, which is a fabulous content management system and development framework, which makes a lot of hard things easier, and which is supported by loads of available modules. I was excited by that, and wanted to switch all my work to Zope. But after a couple of projects, I just felt kind of stupid. If I just used the pre-packaged modules, it was a piece of cake, but in developing new stuff, I just ended up not really grasping the "Zope Way". The people developing the core stuff are obviously super-smart, but so much so that I couldn't easily follow what they were talking about. So I ended up not going any further with that.

Now, Ruby on Rails is a framework built on top of Ruby. It could have been done in other languages, but Ruby lends itself very well to the purpose. It is developed, initially single-handedly, by David Heinemeier Hansson, a young Danish developer. Who is obviously also super-smart, but who additionally has a knack for making things extremely simple, and for just doing everything right. It supports the best practices for development, it supports most things that currently are cool and happening, like Ajax, it is well structured, easy to test, easy to deploy, etc. And with Ruby on Rails you don't pride yourself on how many lines of code you've written and how long it has taken, but quite the opposite. You'll brag about having written some major application in a few weeks, with just a few thousand lines of code.

Rails is built on a fixed, but flexible structure, or pattern, rather, called MVC. Model, View, Controller. The models keep track of the data, connect with databases, validate the data, and store the business rules for the data. The controllers receive requests from the users for doing different kinds of actions. The views are templates for what is presented to the user. That's nothing really new, but it is a good way of organizing an application. One could do that in PHP, but typically one doesn't. Now, Rails enforces a certain structure. There's a set of directories where these various things are stored, and there are certain naming conventions for what they're called. That some of these things are set and known in advance is in part what makes it very simple. A Rails principle is "Convention over Configuration". If we know that the model is always found in the models directory, and that it is named to correspond to the database table, and a few other conventions, we suddenly have little need for configuration files that define where things are and what goes with what.

Another basic principle is "Don't Repeat Yourself" (DRY). Meaning, one should really only do something once. If you have a rule that says that you can not order more than 10 of a given item, there should be one and only one place to say that. Most programmers would want to follow a rule like that, but in most systems it is hard to stay true to it in practice. Not so with Rails, as there typically already is one right place to store that item, so there's no doubt about it.

The online video demos for Rails are mind-blowing. You know, like write a simple weblog program in 15 minutes. If you just want to try Ruby itself, here's a great little interactive tutorial.

Well, I haven't gotten much further than installing Ruby and Rails on my machine and going through a few tutorials. But I'm very impressed, and I think this probably will be a way I'll go.

I'm an expert at PHP programming, and I've done a number of fairly impressive things. But it tends to end up being a bit of a mess. You can do a quick thing in PHP really, really quickly. But a complex program in PHP is very complex. And after you've done it, you discover that there isn't any very good way of testing it, and things break whenever you change something. And everybody does things a little differently, so if you get the job of changing something in somebody else's program, it usually looks like a big pile of spaghetti, however cleverly it might have been written.

I just spoke with one of the people from a company I worked with for several years, developing big things in PHP. I had wondered why I hadn't heard from them for a few months. Turned out that in the meantime they had converted their whole operation to Rails, and they are extremely happy with it, and everything was much easier. That's some folks with very high-volume websites and a few dozen servers. And no wonder they don't need me any more.

Luckily Ruby and Rains are so relatively simple that one can become an expert faster than in various other arenas. Oh, it is not a complete no-brainer, either. Rails can seem a bit intimidating at first. No graphical interface or anything. You're editing text files and running command-line utilities. The productivity mainly starts when one is fluent with all the basic pieces, and one intuitively knows where things go.

Anyway, the best places to learn are the two main bibles, which are lying right here next to me. Programming Ruby and Agile Web Development with Rails. You can read them online too, for that matter.

Ruby and Rails are often connected with "agile" or "pragmatic" programming. These are keywords for modern methods of fast and flexible development which are very different from the traditional slow and linear methods. You know, traditionally one would learn to develop software according to a certain Structured Development Life Cycle (SDLC) approach, which involves copious amounts of formal proposals, specifications, etc. You know, first a committee of people would do feasibility studies, then it would go to an analyst who would make models and specs, etc. And the programmers would be told what to do, essentially. And when they discover that it isn't a great idea to do it that way, or when, later, one discovers that it wasn't really what was needed, it is a bit cumbersome to change. The Agile, Pragmatic or Extreme approach would rather be to go very light on the specs and analysis, and get down to work ASAP, but to do it very quickly, with very short incremental phases, like daily updates, and to do it, as much as possible WITH the stakeholders who need the result. Like, preferably sit down with the end users, and change stuff and show them right away. One could theoretically do that with any language, like PHP, although it isn't easy, and one would probably be crazy to hope to do that with Java or C++. But if you're working with a framework that all the way through is geared towards working like that, it comes much more naturally.

Anyway, I could sure use a 10X productivity boost. And right now Rails looks like the most likely candidate in the programming arena. Plus, I want to be cool too.
[ Programming | 2005-12-14 15:15 | 5 comments | PermaLink ] More >

Saturday, March 19, 2005

Comment and Refererrer Spam

I've again turned on that trackbacks show here, as I'm more confident that spammers can be thwarted fairly easily. Basically I've installed mod_security, an Apache module which makes it quite easy to filter all web accesses, form postings and so forth, and catch when various keywords are used, in addition to catching various behaviors that are likely to be hacking attempts, like trying to access unauthorized directories, etc.

There's something arriving several times per minute. Most often just a robot accessing some blog page and including a fake referral URL, pretending there's a link on some gambling site to that page. Which there isn't. But also attempted comment postings and trackbacks. The latter having been the most problematic, as I have them show up prominently as comments in the articles. So I had that turned off for a while.

The drawback to keyword filtering, however, is that I have to manually configure a list of keywords that trigger the alarm. I'm after the ones that appear in URLs and that I'm pretty sure have no business here. Viagra, poker, incest, texas hold-em - that kind of thing. And I'll have to add new ones when they're used. I try to limit the filter to comment and trackback postings and to referrer URLs. But it is quite possible that this will block some legitimate uses of those words. So far it seems to work pretty well.
[ Programming | 2005-03-19 16:04 | 2 comments | PermaLink ] More >

Wednesday, February 23, 2005

Wikipedia

Last week Google offered to host part of Wikipedia's content. Yesterday Wikipedia was brought off line for some hours by a power failure and subsequent database corruption.

Now, I pay attention to those things not just because Wikipedia is a great resource that needs to be supported. But also because I'm working on a clone of it, and I've been busy downloading from it recently.

The intriguing thing is first of all that that even is a possible and an acceptable thing to do. Lots of people have put a lot of work into Wikipedia, and it is generously offered as a free service by a non-profit foundation. And not only that, you can go and download the whole database, and the software needed for running a local copy of it on another server. Because it is all based on free licenses and open source. And, for that matter, it is in many ways a good thing if people serve up copies of it. It takes a load off of their servers, and potentially others might find new things to do with it.

Anyway, that's what I'm trying to do. At this point I've mostly succeeded in making it show what it should show, which is a good start.

Even though the parts in principle are freely available, it is still a pretty massive undertaking. Just the database of current English language articles is around 2GB. And then there are the pictures. They offer in one download the pictures that are considered in the "Commons" section, i.e. they're public domain. That's around 2GB there too. But most of the pictures are considered Fair Use, meaning they're just being used without particularly having gotten a license for it. So, they can't just share them the same way. But I can still go and download them, of course, just one at a time. I set up a program to pick up about 1 per second. That is considered decent bahavior in that kind of matters. Might sound like a lot, but it shouldn't be much of a burden for the server. For example, the Ask Jeeves/Taoma web spider hits my own server about once per second, all day, every day, and that's perfectly alright with me. Anyway, the Wikipedia picture pickup took about a week like that, adding up to something like 20GB.

Okay, that's the data. But what the database contains is wiki markup. And what the wikipedia/mediawiki system uses is pretty damned extensive markup, with loads of features, templates, etc. Which needs to be interpreted to show it as a webpage. My first attempt was to try the mediawiki software which wikipedia runs on. Which I can freely download and easily enough install. But picking out pieces of it is quite a different matter. It is enormously complex, and everything is tied to everything else. I tried just picking out the parsing module. Which happened to be missing some other modules, which were missing some other modules, and pretty soon it became unwieldy, and I just didn't understand it. Then I looked for some of the other pieces of software one can download which are meant to produce static copies of wikipedia. They're very nice work, but either didn't quite do it quite like I wanted it, or didn't work for me, or were missing something important, like the pictures. So I ended up mostly doing it from scratch, based on the wikipedia specs for the markup. Although I also learned a number of things from wiki2static, a perl program which does an excellent job in parsing the wikipedia markup, in a way I actually can understand. It still became a very sizable undertaking. I had a bit of a head start in that I've previously made my own wiki program, which actually uses a subset of wikipedia's markup.

As it says on the wikipedia download site:

These dumps are not suitable for viewing in a web browser or text editor unless you do a little preprocessing on them first.

A "little preprocessing", ha, that's good. Well, a few thousands lines of code and a few days of non-stop server time seems to do it.

Anyway, it is a little mindblowing how cool it is that masses of valuable information is freely shared, and that with "a little preprocessing" one can use them in different contexts, build on top of them, and do new and different things, without having to reinvent the wheel first.

But the people who make these things available in the first place need support. Volunteers, contributors, bandwidth, money.
[ Programming | 2005-02-23 21:34 | 5 comments | PermaLink ] More >

Tuesday, February 22, 2005

Mail

I've used Eudora for handling my mail forever. Since 1994, as far as I remember. I have around 5G of archived messages I've sent and received. And it has worked pretty well most of the time. That is, until spam started becoming such a huge problem. A spam filter on my server and a spam plugin in Eudora helped greatly. But, increasing, the whole mail thing became really unwieldy. You know, 20,000 messages in my inbox, because I can't really find anything in all those folders, and at least I might run into it again if it stays in my inbox. Usually I don't. And Eudora has various quirks. It crashes once in a while, as it always has. And it doesn't show html messages very well.

So I switched to Thunderbird a couple of months ago. Took quite some time to convert all my mailbox folders from Eudora's format to the standard format. Thunderbird right away did a bunch of things better. For the first time I could see how all those pretty html messages really look. And it is much better at showing me what is new, so I could more quickly find out what is what. But with the volume of mail I have, it is unbearably slow on my computer, which isn't terribly fast. And it is still the same model, with a hierarchy of folders that I can't find anything in.

I had had a Google gmail for a while that I hadn't used, so I decided to forward all my incoming mail there to see how it works. And, somewhat surprisingly, it does most things faster and more smoothly than any of my local programs. No waiting for downloading, as it is already there. And searching for anything is as quick as typing in a word, and it will search through the full text for it, in no time at all. I was a bit skeptical about that apparently very simplistic model where there are no folders. You either have things in your inbox or you "archive" it. And you can apply a few labels to things. So, finding anything is a matter of either a full-text search, or listing anything that has any of the labels. It works remarkably well. The result is that for the first time in years my inbox can stay under 100 messages all the time, quite effortlessly. OK, there are a few things I'd like to have that it doesn't do perfectly for me. To select messages, you have to click a checkbox once for each of them. Yeah, that's simple, but it takes longer than if you just run the mouse over a series of messages. And to actually delete something I have to go and select "Move to Trash" from a menu. That's easy, but it takes at least twice as long as hitting a delete button, which doesn't exist here. It is because the philosophy is to not ever throw anything away. But some of the spam does end up in my inbox, and I don't have in mind archiving it. Archiving is deliberately made easier than deleting. Oh, and there's no preview pane. It shows the beginning words of each message, which is often enough, but it takes another click and screen refresh to see the whole thing. And even though it shows incoming html messages fine, I can't edit outgoing messages like that. But otherwise it has many amazing features that are camouflaged behind their apparent simplicity.

But now, I'm actually making a webbased mail client program myself. I probably wouldn't have thought of that myself, because there are plenty to choose from already. But somebody's paying me for it. Not much, but it is reasonably fun to work on. And since I'm doing it anyway, I might as well give it the features I'd like, most of them at least. So my program has a preview pane, and it edits outgoing messages in wysiwyg html format, and it has on-the-spot spellchecking quite similar to Google's. And I can select messages without clicking. And it handles international character sets.

I suppose I'll end up using some version of my own program, unless Google makes their thing even better. Anyway, I'm 36% full in my 1G gmail account, after less than a month and starting from scratch.
[ Programming | 2005-02-22 17:32 | 6 comments | PermaLink ] More >

Thursday, February 10, 2005

More Google wizardry

Google has a new Map service. Which maybe at first looks like any other online map thing. You can see the streets in some area, map a route somewhere, search for addresses and that kind of thing. But Google has a knack for making services that are really simple, and look really simple, but that use a lot of hidden wizardry. They effortlessly do things that most professional web developers would swear would be impossible to do in a webpage. But, ok, we're catching on now. So whenever they come out with something new, somebody will dissect it and tell us how they did it.

There's Google Suggest that magically can provide you lists of possible search terms as you're typing, complete with number of matches for each. Chris Justus did a thorough job dissecting that. They use the XMLHttp for exchanging data with the server in real time.

And there's Gmail. Again, seems very simple. But it does spell checking and addressbook lookups in real time. Stuff I had gotten used to accepting that one just couldn't do in webpages. But you can, with XMLHttp and with iFrames. And with some extremely responsive servers. Various people have analyzed Gmail, like John Vey. Part of their trick is that the user interface gets stored at the client's end, so that only data is passed back and forth to the server. As opposed to "normal" webpages where everything is sent from the server whenever you load a new page.

Now Joel Webber has dissected Google Maps. So, some of the same tricks again with real-time server communication, in part using a hidden iframe. And then there's the infinitely scrolling maps. The trick is in part to make them out of little tiles, and removing some at one end while adding new ones at the other end, in real time. And routes are added on top with transparent PNG files.

Now, if somebody could just pay me for duplicating some of those tricks so I can get time to study up on them. Or my skills are suddenly getting a little old.
[ Programming | 2005-02-10 16:00 | 5 comments | PermaLink ] More >

Friday, February 4, 2005

The Six Laws of the New Software

As a programmer I'd of course like to write something great and new and useful. But the problem is, as Dror Eyal says in
The Six Laws of New Software:

You're too late! Most home consumers have all the software they will ever need, and most companies out there already have all the basic technologies they need to successfully compete.

Hm, yeah. But not exactly encouraging. So, what are those six laws?

SINGLE IDEA: The best way to succeed in the marketplace is to create software that fulfills a specific need. This may seem like an obvious point at first, but if you can not explain to the end user what the software does in a single sentence it is probably too complex. Your first task is to ask yourself, “What does my product do?”

COLLABORATE: Forget enterprise systems that do everything possible within your field. They’re too large, clumsy and require too much development time. Instead, create small discrete software that can collaborate seamlessly with the technology that the end users are currently using.

DISAPPEAR: No matter what kind of software you are creating, you have to simplify the interface. The greatest software in the world is useless if it is too complex to use. Decrease the interruption of the user experience by reducing the user interface to the point where only the essence is showing.

SIMPLIFY: Do I have to go through a course to work with your technology? If so, you are already out of the market. I don’t have time and I already have something similar which I’m used to.

RELEASE: Start creating and releasing your software now. Think prototypes, iterative releases and user base. Don’t spend your time on writing business plans, designing a website and choosing logos. The competition is moving a lot faster than you may think.

COMPLY: Find the relevant international standard in your marketplace and comply. This will enforce good architecture and keep your product on track when your customers will want it to integrate with their legacy software. You know they will want you to integrate.

He elaborates the detail on each one. Well, excellent advice. Do something clear and simple and useful. Make it obvious to use. Make it work with existing standards and other programs. And don't think about it for very long. Put it out there right now. Yep.
[ Programming | 2005-02-04 15:14 | 7 comments | PermaLink ] More >

Wednesday, February 2, 2005

Blog, Ping and Spam

I'm doing various little programming contract jobs at the moment. And it is remarkable to notice how much effort apparently is being spent on trying to abuse various shared internet resources. I.e. getting around the way something was intended to be used, for the sake of self-promotion. Like, somebody just asked me to do a program doing what Blogburner is doing. I said no, and gave the guy a piece of my mind, but I'm sure he'll find somebody else to do it. "Blog and Ping" they call it. It is essentially that you automatically set up a number of fake blogs at a site like Blogger and you automatically post a large number of regular web pages to them, pinging the blog update sites as you do it, pretending that you just posted something new on your blog. Of course exploiting the somewhat favored status that blogs have in search engines, and attracting traffic. Under false pretenses.

And that's just one of many similar project proposals I see passing by. There are obviously many people getting various kinds of spamming programs made. You know, stuff like spidering the web for forums and then auto-posting ads to them. Or automatic programs that sign up for masses of free accounts in various places. Or Search Engine Optimization programs that create masses of fake webpages to try to show better in the search engines. I don't take any of that kind of jobs, but it is a bit disturbing to see how many of them there are.

It is maybe even surprising how well the net holds up and how the many freely shared resources that are available can be viable. Another example. You know, there's the whois system that one uses to check the registration information for a domain, who owns it, when it expires, etc. Now, there's a business in trying to grab attractive domain names that for one reason or another expire. So there are people who set up servers that do hundreds of thousands of whois lookups every hour, in order to catch domains right when they expire, in order to re-register them for somebody else. Or any of a number of variations on that scheme. To do that you'll want to do maybe 100 whois lookups every second. And most whois servers will try to stop you from doing that, but having some kind of quota of how many you can do, which is much less. So, you spread the queries over many IP numbers and many proxy servers, in order to fool them. And the result is inevitably that a large amount of free resources are being spent, in order for somebody to have a little business niche.

At the same time I can see that part of what makes the net work in good ways is indeed that one can build on somebody else's work with few barriers. That one can quote other people's articles, borrow their pictures, play their music, link to their sites, use their web services, etc. And add a little value as one does so. And I suppose the benefit of generative sharing will outweigh the problems with self-serving abuse of what is shared. But it seems it also involves an continuous struggle to try to hinder abusive use of freely accessible resources.

Like, in my blog here. An increasing number of visits are phoney, having bogus referrer information, just to make a site show up in my referrer logs. No very good solution to that, other than if I spend server resources on spidering all the sites to see if they really have a link to here.
[ Programming | 2005-02-02 18:37 | 5 comments | PermaLink ] More >

Link Spamming

The Register: Interview with a link spammer. Well, yesterday I got over 600 phoney trackback entries in my blog. Might very well have been from that guy. Done through proxies from many different IPs, all promoting various gambling sites. Hard to compete with scumback programmers like that.

Hm, for regular comments it works alright to require the entry of some characters from a graphic. There is still spam coming in that way, but it is, I'm sure, done one at a time by manually entering it, so that's not too much of a nuissance. But I can't do that for trackbacks.

I suppose a partial answer would be to spider the site that does the trackback, to see if it really has a link to one's blog. A clever spammer could very well have the link, but he probably doesn't. I'll have to explore that. Another would be to block the sites that are being promoted, but they use so many different changing domains that that's hard to keep up with.
[ Programming | 2005-02-02 17:02 | 6 comments | PermaLink ] More >

Tuesday, January 25, 2005

Webcam dsfdsf? fdsfdfdsdsfd?

So, I continue to have a bit of fun with that webcam thing I did. In part because there still are several thousand people coming by looking at it every day. So I add a few improvements once in a while.

Mikel Maron made the nice suggestion that one could establish the more precise location of the different cams collaboratively, and then one could maybe do fun things like having them pop up on a world map or something. So, I added forms for people to correct or expand the information on each location. Like, if they know the city, or the name of the building, company, bridge, or whatever, they can type it in. And while I was at it, I added a comment feature.

OK, so, presto, instant collaboration. Within a couple of hours lots of helpful (or maybe bored) visitors had figured out where a bunch of these places were, and they had typed them in.

But, at the same time, what is going on is that these webcams seem terribly interesting to Chinese or Japanese speaking people. 70,000 people came from just one Japanese softcore porn news site who for some reason linked to it.

But then there's a slight, eh, communication problem here. Or language problem. Or character set problem. See, I've set it up so that the forms where you leave comments or update the info can take Unicode characters. So if somebody wants to type a comment in Japanese, they should be able to do that. And some people do. But the explanatory text on my page is in English. And it seems that a large number of people don't really have any clue what any of it says, but they have a certain compulsion to type things into any field that they see. So, if there's a button that leads to a form where you can correct the city of the camera, they'll click on it, and they'll enter (I suppose) their own information. Or they say Hi or something. See, I find it very mysterious what they actually are writing. It is for sure nothing like English. But it isn't what will appear as Chinese or Japanese characters either. Rather, it looks to me like what one would type if one was just entering some random test garbage, by quickly running one's fingers over a few adjacent keys. But the strange thing is that dozens and dozens of different people (with different IPs) are entering either very similar, or exactly the same, text. This kind of thing:

Facility: fdsfdfdsdsfd

City: dsfdsf

Yeah, I can type that with 3 fingers without moving my hand from the keyboard. But why would multiple people type exactly the same thing?? Does it say something common in Chinese?

Now, we have a bit of a cryptographic puzzle here. Notice that "Facility" (the name of the field) has twice as many letters as "City". And "fdsfdfdsdsfd" has twice as many letters as "dsfdsf". Consider the possibility that somebody might think they're supposed to enter the exact word they see into the field. Like some kind of access verification. And they use some kind of foreign character input method that encodes Latin characters as one and a half bytes. If so, I can't quite seem to decode the system.

Or, are we dealing with some kind of Input Method Editor (IME) that lets people form Chinese symbols by repetitive use of keys on a QWERTY keyboard? Anybody knows?

This is a bit like receiving signals from some alien civilization. Where's the signal in the noise? How might these folks have encoded their symbols, and what strange things might they be referring to? Are they friendly? dsfdsf?

Otherwise, if anybody here actually speaks Chinese or Japanese, could you give me a translation, preferably into the proper character set, of a sentence like: "This is the information for the camera location. Please do not enter your own personal information here!"
[ Programming | 2005-01-25 20:25 | 9 comments | PermaLink ] More >

Unicode

I'm beginning to love Unicode. At first I just started to yawn whenever I heard it mentioned. Yes, very un-geekly. But now I think it is a very good thing.

If you don't already know, Unicode is a unified system for encoding pretty much all the characters in all current languages. Some 100,000 or so. To replace hundreds of different incompatible local methods of entering and encoding characters.

Before, there was ASCII. One character per byte, which gives 256 possibilities, all of the combinations of the 8 binary bits in a byte. There was wide agreement on the first 128, whereas the last 128 changed from country to country, language to language. That worked ok when we were only talking about European languages, with latin letters like I'm writing, and if one just remembered what country's character set is used. But it is hopelessly inadequate for many other character sets, particularly Asian ones, like Chinese that has thousands of characters. Then one would use some system of storing each character in several bytes, and one would load special software on one's computer to be able to enter and view the characters, and they wouldn't be visible if one didn't have it.

Anyway, Unicode simplifies all of that. One coding system for all of it. It might still be tricky to figure out how to enter the various characters, but at least each one has its own code, a 4-digit hexadecimal code.

For practical purposes, on the web, the winning approach is a compromise called UTF-8. Instead of the straight Unicode, it will store characters as a variable number of bytes, from one to four. Normal English text, which would fit in the first 1/2 byte of ASCII, will be stored exactly the same way. But anything else can be done by the use of additional bytes.

Now, I don't totally grok the whole encoding scheme, but that doesn't really matter, because I probably don't have to do it in my head. The main thing is to use UTF-8 wherever I possibly can. I'm making a couple of programs right now where it is a must, and where it makes everything nicely simple. One is a newsfeed aggregator, which needs to be able to show the content of any feed in any language. The other is a mail client, which needs to do the same. And it seems like I succeeded with relatively little pain. OK, I don't always know when I need to encode and when I need to decode and when I need to leave things alone, but a little trial and error sorts it out. And then it is basically making sure that web pages are served with the UTF-8 encoding, and that my database stores things in UTF-8. MySQL 4.1 handles the last part nicely. And then any modern browser should see the characters as they're meant to be seen, no matter if they're Chinese, Hebrew, or whatever.

That also leaves a sore spot for the programs I really need to convert, but I haven't yet. This weblog program here does not yet handle unicode, so I can't just type in a bunch of stuff to impress you. Well, one can always do it with some special HTML codes, like here: ⽇⽉ . Oops, that didn't work either. I was trying to write Ming in Chinese. Anyway, that isn't what the UTF-8 thing is about. One should be able to just type things in in one's own language without having to worry much about codes.
[ Programming | 2005-01-25 18:25 | 1 comment | PermaLink ] More >

Saturday, December 18, 2004

World's Smallest P2P Application

To demonstrate how simply it could be done, Ed Felten wrote a tiny peer-to-peer file-sharing program. Fifteen lines of Python:

# tinyp2p.py 1.0 (documentation at freedom-to-tinker.com/tinyp2p.html)
import sys, os, SimpleXMLRPCServer, xmlrpclib, re, hmac # (C) 2004, E.W. Felten
ar,pw,res = (sys.argv,lambda u:hmac.new(sys.argv[1],u).hexdigest(),re.search)
pxy,xs = (xmlrpclib.ServerProxy,SimpleXMLRPCServer.SimpleXMLRPCServer)
def ls(p=""):return filter(lambda n:(p=="")or res(p,n),os.listdir(os.getcwd()))
if ar[2]!="client": # license: creativecommons.org/licenses/by-nc-sa/2.0
   myU,prs,srv = ("http://"+ar[3]+":"+ar[4], ar[5:],lambda x:x.serve_forever())
   def pr(x=[]): return ([(y in prs) or prs.append(y) for y in x] or 1) and prs
   def c(n): return ((lambda f: (f.read(), f.close()))(file(n)))[0]
   f=lambda p,n,a:(p==pw(myU))and(((n==0)and pr(a))or((n==1)and [ls(a)])or c(a))
   def aug(u): return ((u==myU) and pr()) or pr(pxy(u).f(pw(u),0,pr([myU])))
   pr() and [aug(s) for s in aug(pr()[0])]
   (lambda sv:sv.register_function(f,"f") or srv(sv))(xs((ar[3],int(ar[4]))))
for url in pxy(ar[3]).f(pw(ar[3]),0,[]):
   for fn in filter(lambda n:not n in ls(), (pxy(url).f(pw(url),1,ar[4]))[0]):
      (lambda fi:fi.write(pxy(url).f(pw(url),2,fn)) or fi.close())(file(fn,"wc"))

And it can both be a client and a server. Of course this isn't going to run any network with 100s of thousands of users. But it shows how impossible it is for the enemies of file sharing to stop it.
[ Programming | 2004-12-18 16:48 | 1 comment | PermaLink ] More >

Google Suggest Dissected

Chris Justus diggested the very cool Google Suggest. The code is absolutely brilliant and uses a bunch of tricks that few people knew about. I didn't even realize there were functions that could pick up live XML data without reloading the page. As he says, that's going to raise the bar for what we expect from webpages. I just have to figure that out and do some cool things with it. Apparently most of the major browsers support the necessary functionality. Which means one can do web apps that work much more like "real" apps, in terms of being responsive and picking up data from a database without having to reload the whole thing. It is still a mystery how google's servers can respond so damned fast, though.
[ Programming | 2004-12-18 16:24 | 2 comments | PermaLink ] More >

Tuesday, November 30, 2004

Open directory of RSS feeds

Robin Good suggests a directory of freely re-distributable RSS feeds. Which is a fine idea. I'm not aware of any being in existence. Well, there are some nice directories of feeds, like NewsIsFree and Syndic8 where one can subscribe oneself to thousands of feeds and make one's own personal news portal. But can one mix and match from them to offer one's own feeds? Is the content really licensed for re-distribution? Mostly that's left vague. One might assume that if anybody is offering an RSS or Atom feed it is because they don't mind that one does whatever one feels like with them, but that isn't generally the case. The content is still in principle copyrighted, and various kinds of licenses might be implied.

Robin had a bit of an argument recently with another news site, as he took the liberty of creating an RSS feed of their articles as a service. Articles which are really just assembled from other public news sources. And they felt he was somehow bereaving them of income by stealing their content without asking. But why shouldn't he?

The answer should really be a directory of feeds with clear Creative Commons types of licenses. I.e. people would state explicitly whether it is public domain, whether they need credit, whether it can be used for non-commercial purposes, etc. Which cuts off a lot of red tape as you right away will know what you can do with it. And it opens the door for better tools for constructing custom feeds out of other feeds. The Algebra of Feeds like Seb Paquet called it.
[ Programming | 2004-11-30 15:16 | 2 comments | PermaLink ] More >

Article Views

I'm trying to produce statistics on how many views each article gets in this weblog program. Which is more tricky than it might seem.

One can either see articles in the front page, the most recent at the top. Or one can click on a detail link and see it in full. Now, if the poster has chosen, like I do, to put the whole article in the portion that shows on the front page, the only reason to click on the detail link is if one wants to leave comments. Or it is if one comes from an outside link, which normally will link to the full article.

So, I can easily count how many people view the full article by itself. But many more people will just read it in the front page. So how do I count that? I do have a log of accesses to the front page. But how much of that would I count as views of the individual articles. Like, if there are 10 articles that show in the front, do I assume people only read the first one, the first two, or all of them?

I suppose I can make some arbitrary assumptions. Like, my current plan is to apply all views of the front to all articles posted within the last day before that, or the most recent 3 articles, whichever number is highest. That's maybe reasonable fudging, but not exactly 'correct'.

If, for example, I don't get around to posting anything for a week, then lots of people will come by and they'll all see the same article at the top. And when the stats are added up that article will look like it was terribly popular, even though it wasn't.

And then there are RSS feeds. Should I not count them? Should I assume that each RSS user looks once at each of the articles they pick up?

Hm, maybe that's why I notice no other blogs having view counts for their articles. Nobody knows how to calculate them. But then again, I don't really see anybody else having graphs for number of overall readers, and that's a good deal easier to calculate.
[ Programming | 2004-11-30 01:25 | 7 comments | PermaLink ] More >

Monday, September 20, 2004

Database Optimization

Don't know why it was exactly today, other than that it is Monday, but my MySQL server suddenly decided that there was way too much to do. Oh, of course it didn't, but sometimes things reach a certain threshold. I had been wondering why it took my blog so long to load recently, and the server started being really busy all the time. And mysql has this optional log of queries that take too long, which provided the answers I needed. On a server that is doing many things at the same time, anything that takes longer than a second is taking way, way too long, if we're talking about database queries. And now I realized that the queries used to produce the lists of recent referrers and search engine questions which show in the sidebar of some of the blogs took, like, five or ten seconds, which is horrible. Even if just one were running at a time, but there are only a few seconds between each time somebody views a blog page, so that can quickly become bad. So I had to quickly rewrite it so that it figures this out every hour, rather than in real time, and I optimized the indexes a bit.

From past experience, things are much more likely to bog down in MySQL once there's 3-4 million records in a table. Not just gradually worse, but like that things suddenly are taking orders of magnitude longer. And, well, the table that keeps track of blog pageviews has about 4 million entries, for the last four months or so. Anyway, all seems better now, and the server is humming along normally.
[ Programming | 2004-09-20 23:28 | 3 comments | PermaLink ] More >

Wednesday, September 8, 2004

Upgrades

I did a few upgrades on my server. That kind of thing is best done in spurts, as one usually can't upgrade just one thing, as one program depends on the other, and it easily ends up taking days to sort out the domino effects. But has to be done once in a while, if nothing else to catch the latest security fixes, and for the sake of progress.

So, I started by moving up to MySQL 4.1.3. Which is strictly speaking a beta version, but MySQL is normally so stable and bullet-proof, and I had seen so many references to cool features of 4.1 that it seemed to be time. But more of a jump than I expected. Database access suddenly didn't work in PHP, as the access libraries were different. And since I had to recompile it anyway, I might as well update some of the packages it uses. But I quickly ran into a few problems, and figured I might as well move from PHP 4.3.3 to 5.0.1 and see if that worked better. Which meant I had to upgrade some other packages it needed, and quickly they were then a version that weren't going to work with the earlier PHP, so I had to make it work. And I ended up upgrading libxml2, curl, libxpm, t1lib, libungif, libpng, ming (flash library), mod_ssl, mod_auth_mysql, and finally it all compiled properly. Oh, and apache too.

Upgrading a live server is a bit like replacing the engine of a car while it is running. The users of course expect that the car keeps running, but you have to take the engine out, and adjust a few things to make the new engine fit, and then the old engine no longer fits. And even if the new one runs, there might be hundreds of little things that suddenly might not work. It might be a little while before you discover that gif files no don't convert right, or that some little-used function just isn't happening now. In this case it wasn't too much. A version of Wordpress was crashing with PHP5 and needed to be upgraded. And MySQL 4.1 introduced some pervasive new features for dealing with character sets, which produced some strange errors until I got the configuration set right. And it removed support for the old ISAM database format. Which there still were some of, so they suddenly couldn't even be updated to the newer MyISAM format. So I needed to move them to another server, and update them, and move them back.

Upgraded the sendmail mail engine too, to get some security fixes. And suddenly the server started sending out a lot of old messages from a couple of months ago. I suppose they had been stuck in the outgoing queue undelivered. But that sure confused a few people.

Anyway, so far so good. All for the sake of progress.
[ Programming | 2004-09-08 23:59 | 2 comments | PermaLink ] More >

Page: 1 2 3 Older stories >>

Main Page: ming.tv