Folder Contents

File spoon-archives/marxism-international.archive/marxism-international_1998/marxism-international.9802, message 16


From: "Ben Seattle" <icd-AT-communism.org>
Subject: M-I: The Cathedral and the Bazaar (parts 7 - 13)
Date: Mon, 2 Feb 1998 23:50:32 -0800


[continued from parts 1 - 6]

====================================================7. Fetchmail Grows Up
====================================================
There I was with a neat and innovative design, code that I knew 
worked
well because I used it every day, and a burgeoning beta list.  It
gradually dawned on me that I was no longer engaged in a trivial
personal hack that might happen to be useful to few other people.  I
had my hands on a program every hacker with a Unix box and a SLIP/PPP
mail connection really needs.

With the SMTP forwarding feature, it pulled far enough in front of 
the
competition to potentially become a "category killer", one of those
classic programs that fills its niche so competently that the
alternatives are not just discarded but almost forgotten.

I think you can't really aim or plan for a result like this.  You 
have
to get pulled into it by design ideas so powerful that afterward the
results just seem inevitable, natural, even foreordained.  The only
way to try for ideas like that is by having lots of ideas -- or by
having the engineering judgment to take other peoples' good ideas
beyond where the originators thought they could go.

Andrew Tanenbaum had the original idea to build a simple native Unix
for the 386, for use as a teaching tool.  Linus Torvalds pushed the
Minix concept further than Andrew probably thought it could go -- and
it grew into something wonderful.  In the same way (though on a
smaller scale), I took some ideas by Carl Harris and Harry Hochheiser
and pushed them hard.  Neither of us was `original' in the romantic
way people think is genius.  But then, most science and engineering
and software development isn't done by original genius, hacker
mythology to the contrary.

The results were pretty heady stuff all the same -- in fact, just the
kind of success every hacker lives for!  And they meant I would have
to set my standards even higher.  To make fetchmail as good as I now
saw it could be, I'd have to write not just for my own needs, but 
also
include and support features necessary to others but outside my 
orbit.
And do that while keeping the program simple and robust.

The first and overwhelmingly most important feature I wrote after
realizing this was multidrop support -- the ability to fetch mail 
from
mailboxes that had accumulated all mail for a group of users, and 
then
route each piece of mail to its individual recipients.

I decided to add the multidrop support partly because some users were
clamoring for it, but mostly because I thought it would shake bugs 
out
of the single-drop code by forcing me to deal with addressing in full
generality.  And so it proved.  Getting RFC822 parsing right took me 
a
remarkably long time, not because any individual piece of it is hard
but because it involved a pile of interdependent and fussy details.

But multidrop addressing turned out to be an excellent design 
decision
as well.  Here's how I knew:

     14. Any tool should be useful in the expected way, 
         but a *great* tool lends itself to uses you 
         never expected.

The unexpected use for multi-drop fetchmail is to run mailing lists
with the list kept, and alias expansion done, on the client side of
the SLIP/PPP connection.  This means someone running a personal
machine through an ISP account can manage a mailing list without
continuing access to the ISP's alias files.

Another important change demanded by my beta testers was support for
8-bit MIME operation.  This was pretty easy to do, because I had been
careful to keep the code 8-bit clean.  Not because I anticipated the
demand for this feature, but rather in obedience to another rule:

     15. When writing gateway software of any kind, take 
         pains to disturb the data stream as little as 
         possible -- and *never* throw away information 
         unless the recipient forces you to!  

Had I not obeyed this rule, 8-bit MIME support would have been
difficult and buggy.  As it was, all I had to do is read RFC 1652 and
add a trivial bit of header-generation logic.

Some European users bugged me into adding an option to limit the
number of messages retrieved per session (so they can control costs
from their expensive phone networks).  I resisted this for a long
time, and I'm still not entirely happy about it.  But if you're
writing for the world, you have to listen to your customers -- this
doesn't change just because they're not paying you in money.

====================================================8. A Few More Lessons From Fetchmail
====================================================
Before we go back to general software-engineering issues, there are a
couple more specific lessons from the fetchmail experience to ponder.

The rc file syntax includes optional `noise' keywords that are
entirely ignored by the parser.  The English-like syntax they allow 
is
considerably more readable than the traditional terse keyword-value
pairs you get when you strip them all out.

These started out as a late-night experiment when I noticed how much
the rc file declarations were beginning to resemble an imperative
minilanguage.  (This is also why I changed the original popclient
`server' keyword to `poll').

It seemed to me that trying to make that imperative minilanguage more
like English might make it easier to use.  Now, although I'm a
convinced partisan of the "make it a language" school of design as
exemplified by Emacs and HTML and many database engines, I am not
normally a big fan of "English-like" syntaxes.

Traditionally programmers have tended to favor control syntaxes that
are very precise and compact and have no redundancy at all.  This is 
a
cultural legacy from when computing resources were expensive, so
parsing stages had to be as cheap and simple as possible.  English,
with about 50% redundancy, looked like a very inappropriate model
then.

This is not my reason for fighting shy of English-like syntaxes; I
mention it here only to demolish it.  With cheap cycles and core,
terseness should not be an end in itself.  Nowadays it's more
important for a language to be convenient for humans than to be cheap
for the computer.

There are, however, good reasons to be wary.  One is the complexity
cost of the parsing stage -- you don't want to raise that to the 
point
where it's a significant source of bugs and user confusion in itself.
Another is that trying to make a language syntax English-like often
demands that the "English" it speaks be bent seriously out of shape,
so much so that the superficial resemblance to natural language is as
confusing as a traditional syntax would have been.  (You see this in 
a
lot of 4GLs and commercial database-query languages.)

The fetchmail control syntax seems to avoid these problems because 
the
language domain is extremely restricted.  It's nowhere near a 
general-
purpose language; the things it says simply are not very complicated,
so there's little potential for confusion in moving mentally between 
a
tiny subset of English and the actual control language.  I think 
there
may be a wider lesson here:

     16. When your language is nowhere near Turing-complete, 
         syntactic sugar can be your friend.

Another lesson is about security by obscurity.  Some fetchmail users
asked me to change the software to store passwords encrypted in the 
rc
file, so snoopers wouldn't be able to casually see them.

I didn't do it, because this doesn't actually add protection.  Anyone
who's acquired permissions to read your rc file will be able to run
fetchmail as you anyway -- and if it's your password they're after,
they'd be able to rip the necessary decoder out of the fetchmail code
itself to get it.

All .fetchmailrc password encryption would have done is give a false
sense of security to people who don't think very hard.  The general
rule here is:

     17. A security system is only as secure as its secret.  
         Beware of pseudo-secrets.

====================================================9. Necessary Preconditions for the Bazaar Style
====================================================
Early reviewers and test audiences for this paper consistently raised
questions about the preconditions for successful bazaar-style
development, including both the qualifications of the project leader
and the state of code at the time one goes public and starts to try 
to
build a co-developer community.

It's fairly clear that one cannot code from the ground up in bazaar
style.  One can test, debug and improve in bazaar style, but it would
be very hard to originate a project in bazaar mode.  Linus didn't try
it.  I didn't either.  Your nascent developer community needs to have
something runnable and testable to play with.

When you start community-building, what you need to be able to 
present
is a plausible promise.  Your program doesn't have to work
particularly well.  It can be crude, buggy, incomplete, and poorly
documented.  What it must not fail to do is convince potential co-
developers that it can be evolved into something really neat in the
foreseeable future.

Linux and fetchmail both went public with strong, attractive basic
designs.  Many people thinking about the bazaar model as I have
presented it have correctly considered this critical, then jumped 
from
it to the conclusion that a high degree of design intuition and
cleverness in the project leader is indispensable.

But Linus got his design from Unix.  I got mine initially from the
ancestral popmail (though it would later change a great deal, much
more proportionately speaking than has Linux).  So does the
leader/coordinator for a bazaar-style effort really have to have
exceptional design talent, or can he get by on leveraging the design
talent of others?  

I think it is not critical that the coordinator be able to originate
designs of exceptional brilliance, but it is absolutely critical that
he/she be able to recognize good design ideas from others.

Both the Linux and fetchmail projects show evidence of this.  Linus,
while not (as previously discussed) a spectacularly original 
designer,
has displayed a powerful knack for recognizing good design and
integrating it into the Linux kernel.  And I have already described
how the single most powerful design idea in fetchmail (SMTP
forwarding) came from somebody else.

Early audiences of this paper complimented me by suggesting that I am
prone to undervalue design originality in bazaar projects because I
have a lot of it myself, and therefore take it for granted.  There 
may
be some truth to this; design (as opposed to coding or debugging) is
certainly my strongest skill.

But the problem with being clever and original in software design is
that it gets to be a habit -- you start reflexively making things 
cute
and complicated when you should be keeping them robust and simple.  I
have had projects crash on me because I made this mistake, but I
managed not to with fetchmail.

So I believe the fetchmail project succeeded partly because I
restrained my tendency to be clever; this argues (at least) against
design originality being essential for successful bazaar projects.
And consider Linux.  Suppose Linus Torvalds had been trying to pull
off fundamental innovations in operating system design during the
development; does it seem at all likely that the resulting kernel
would be as stable and successful as what we have?  

A certain base level of design and coding skill is required, of
course, but I expect almost anybody seriously thinking of launching a
bazaar effort will already be above that minimum.  The free-software
community's internal market in reputation exerts subtle pressure on
people not to launch development efforts they're not competent to
follow through on.  So far this seems to have worked pretty well.

There is another kind of skill not normally associated with software
development which I think is as important as design cleverness to
bazaar projects -- and it may be more important.  A bazaar project
coordinator or leader must have good people and communications 
skills.

This should be obvious.  In order to build a development community,
you need to attract people, interest them in what you're doing, and
keep them happy about the amount of work they're doing.  Technical
sizzle will go a long way towards accomplishing this, but it's far
from the whole story.  The personality you project matters, too.

It is not a coincidence that Linus is a nice guy who makes people 
like
him and want to help him.  It's not a coincidence that I'm an
energetic extrovert who enjoys working a crowd and has some of the
delivery and instincts of a stand-up comic.  To make the bazaar model
work, it helps enormously if you have at least a little skill at
charming people.

====================================================10. The Social Context of Free Software
====================================================
It is truly written: the best hacks start out as personal solutions 
to
the author's everyday problems, and spread because the problem turns
out to be typical for a large class of users.  This takes us back to
the matter of rule 1, restated in a perhaps more useful way:

     18. To solve an interesting problem, start by finding 
         a problem that is interesting to you.

So it was with Carl Harris and the ancestral popclient, and so with 
me
and fetchmail.  But this has been understood for a long time.  The
interesting point, the point that the histories of Linux and 
fetchmail
seem to demand we focus on, is the next stage -- the evolution of
software in the presence of a large and active community of users and
co-developers.

In "The Mythical Man-Month", Fred Brooks observed that programmer
time is not fungible; adding developers to a late software project
makes it later.  He argued that the complexity and communication 
costs
of a project rise with the square of the number of developers, while
work done only rises linearly.  This claim has since become known as
"Brooks's Law" and is widely regarded as a truism.  But if Brooks's
Law were the whole picture, Linux would be impossible.

A few years later Gerald Weinberg's classic "The Psychology Of
Computer Programming" supplied what, in hindsight, we can see as a
vital correction to Brooks.  In his discussion of "egoless
programming", Weinberg observed that in shops where developers are
not territorial about their code, and encourage other people to look
for bugs and potential improvements in it, improvement happens
dramatically faster than elsewhere.

Weinberg's choice of terminology has perhaps prevented his analysis
from gaining the acceptance it deserved -- one has to smile at the
thought of describing Internet hackers as "egoless".  But I think
his argument looks more compelling today than ever.

The history of Unix should have prepared us for what we're learning
from Linux (and what I've verified experimentally on a smaller scale
by deliberately copying Linus's methods).  That is, that while coding
remains an essentially solitary activity, the really great hacks come
from harnessing the attention and brainpower of entire communities.
The developer who uses only his or her own brain in a closed project
is going to fall behind the developer who knows how to create an 
open,
evolutionary context in which bug-spotting and improvements get done
by hundreds of people.

But the traditional Unix world was prevented from pushing this
approach to the ultimate by several factors.  One was the legal
contraints of various licenses, trade secrets, and commercial
interests.  Another (in hindsight) was that the Internet wasn't yet
good enough.

Before cheap Internet, there were some geographically compact
communities where the culture encouraged Weinberg's "egoless"
programming, and a developer could easily attract a lot of skilled
kibitzers and co-developers.  Bell Labs, the MIT AI Lab, UC Berkeley
-- these became the home of innovations that are legendary and still
potent.

Linux was the first project to make a conscious and successful effort
to use the entire world as its talent pool.  I don't think it's a
coincidence that the gestation period of Linux coincided with the
birth of the World Wide Web, and that Linux left its infancy during
the same period in 1993-1994 that saw the takeoff of the ISP industry
and the explosion of mainstream interest in the Internet.  Linus was
the first person who learned how to play by the new rules that
pervasive Internet made possible.

While cheap Internet was a necessary condition for the Linux model to
evolve, I think it was not by itself a sufficient condition. Another
vital factor was the development of a leadership style and set of
cooperative customs that could allow developers to attract co-
developers and get maximum leverage out of the medium.

But what is this leadership style and what are these customs?  They
cannot be based on power relationships -- and even if they could be,
leadership by coercion would not produce the results we see.  
Weinberg
quotes the autobiography of the 19th-century Russian anarchist
Kropotkin's "Memoirs of a Revolutionist") to good effect on this
subject:

     "Having been brought up in a serf-owner's family, I 
     entered active life, like all young men of my time, 
     with a great deal of confidence in the necessity of 
     commanding, ordering, scolding, punishing and the like. 
     But when, at an early stage, I had to manage serious 
     enterprises and to deal with [free] men, and when each 
     mistake would lead at once to heavy consequences, I 
     began to appreciate the difference between acting on 
     the principle of command and discipline and acting on 
     the principle of common understanding. The former works 
     admirably in a military parade, but it is worth nothing 
     where real life is concerned, and the aim can be 
     achieved only through the severe effort of many 
     converging wills."

The "severe effort of many converging wills" is precisely what a
project like Linux requires -- and the "principle of command" is
effectively impossible to apply among volunteers in the anarchist's
paradise we call the Internet.  To operate and compete effectively,
hackers who want to lead collaborative projects have to learn how to
recruit and energize effective communities of interest in the mode
vaguely suggested by Kropotkin's "principle of understanding".  They
must learn to use Linus's Law.

Earlier I referred to the "Delphi effect" as a possible explanation
for Linus's Law.  But more powerful analogies to adaptive systems in
biology and economics also irresistably suggest themselves.  The 
Linux
world behaves in many respects like a free market or an ecology, a
collection of selfish agents attempting to maximize utility which in
the process produces a self-correcting spontaneous order more
elaborate and efficient than any amount of central planning could
achieve.  Here, then, is the place to seek the "principle of
understanding".

The "utility function" Linux hackers are maximizing is not
classically economic, but is the intangible of their own ego
satisfaction and reputation among other hackers.  (One may call their
motivation "altruistic", but this ignores the fact that altruism is
itself a form of ego satisfaction for the altruist).  Voluntary
cultures that work this way are not actually uncommon; one other in
which I have long participated is science fiction fandom, which 
unlike
hackerdom explicitly recognizes "egoboo" (the enhancement of one's
reputation among other fans) as the basic drive behind volunteer
activity.

Linus, by successfully positioning himself as the gatekeeper of a
project in which the development is mostly done by others, and
nurturing interest in the project until it became self-sustaining, 
has
shown an acute grasp of Kropotkin's "principle of shared
understanding".  This quasi-economic view of the Linux world enables
us to see how that understanding is applied.

We may view Linus's method as an way to create an efficient market in
"egoboo" -- to connect the selfishness of individual hackers as
firmly as possible to difficult ends that can only be achieved by
sustained cooperation.  With the fetchmail project I have shown
(albeit on a smaller scale) that his methods can be duplicated with
good results.  Perhaps I have even done it a bit more consciously and
systematically than he.

Many people (especially those who politically distrust free markets)
would expect a culture of self-directed egoists to be fragmented,
territorial, wasteful, secretive, and hostile.  But this expectation
is clearly falsified by (to give just one example) the stunning
variety, quality and depth of Linux documentation.  It is a hallowed
given that programmers hate documenting; how is it, then, that Linux
hackers generate so much of it?  Evidently Linux's free market in
egoboo works better to produce virtuous, other-directed behavior than
the massively-funded documentation shops of commercial software
producers.

Both the fetchmail and Linux kernel projects show that by properly
rewarding the egos of many other hackers, a strong
developer/coordinator can use the Internet to capture the benefits of
having lots of co-developers without having a project collapse into a
chaotic mess.  So to Brooks's Law I counter-propose the following:

     19: Provided the development coordinator has a medium 
         at least as good as the Internet, and knows how to 
         lead without coercion, many heads are inevitably 
         better than one.

I think the future of free software will increasingly belong to 
people
who know how to play Linus's game, people who leave behind the
cathedral and embrace the bazaar.  This is not to say that individual
vision and brilliance will no longer matter; rather, I think that the
cutting edge of free software will belong to people who start from
individual vision and brilliance, then amplify it through the
effective construction of voluntary communities of interest.

And perhaps not only the future of free software.  No commercial
developer can match the pool of talent the Linux community can bring
to bear on a problem.  Very few could afford even to hire the more
than two hundred people who have contributed to fetchmail!  

Perhaps in the end the free-software culture will triumph not because
cooperation is morally right or software "hoarding" is morally wrong
(assuming you believe the latter, which neither Linus nor I do), but
simply because the commercial world cannot win an evolutionary arms
race with free-software communities that can put orders of magnitude
more skilled time into a problem.

====================================================11. Acknowledgements
====================================================
This paper was improved by conversations with a large number of 
people
who helped debug it. Particular thanks to Jeff Dutky
<dutky-AT-wam.umd.edu>, who suggested the "debugging is parallelizable"
formulation, and helped developed the analysis that proceeds from it.
Also to Nancy Lebovitz <nancyl-AT-universe.digex.net> for her suggestion
that I emulate Weinberg by quoting Kropotkin.  Perceptive criticisms
also came from Joan Eslinger <wombat-AT-kilimanjaro.engr.sgi.com> and
Marty Franz <marty-AT-net-link.net> of the General Technics list.  Paul
Eggert <eggert-AT-twinsun.com> noticed the conflict between GPL and the
bazaar model.  I'm grateful to the members of PLUG, the Philadelphia
Linux User's group, for providing the first test audience for the
first public version of this paper.  Finally, Linus Torvalds's
comments were helpful and his early endorsement very encouraging.

====================================================12. For Further Reading
====================================================
I quoted several bits from Frederick P. Brooks's classic The Mythical
Man-Month because, in many respects, his insights have yet to be
improved upon.  I heartily recommend the 25th Anniversary addition
from Addison-Wesley (ISBN 0-201-83595-9), which adds his 1986 "No
Silver Bullet" paper.

The new edition is wrapped up by an invaluable 20-years-later
retrospective in which Brooks forthrightly admits to the few
judgements in the original text which have not stood the test of 
time.
I first read the retrospective after this paper was substantially
complete, and was surprised to discover that Brooks attributes 
bazaar-
like practices to Microsoft!  

Gerald P. Weinberg's The Psychology Of Computer Programming (New 
York,
Van Nostrand Reinhold 1971) introduced the rather unfortunately-
labeled concept of "egoless programming".  While he was nowhere near
the first person to realize the futility of the "principle of
command", he was probably the first to recognize and argue the point
in particular connection with software development.

Richard P. Gabriel, contemplating the Unix culture of the pre-Linux
era, reluctantly argued for the superiority of a primitive 
bazaar-like
model in his 1989 paper Lisp: Good News, Bad News, and How To Win 
Big.
Though dated in some respects, this essay is still rightly celebrated
among Lisp fans (including me).  A correspondent reminded me that the
section titled "Worse Is Better" reads almost as an anticipation of
Linux.  The paper is accessible on the World Wide Web at 
"http://alpha-bits.ai.mit.edu/articles/good-news/good-news.html".

De Marco and Lister's Peopleware: Productive Projects and Teams (New
York; Dorset House, 1987; ISBN 0-932633-05-6) is an underappreciated
gem which I was delighted to see Fred Brooks cite in his
retrospective.  While little of what the authors have to say is
directly applicable to the Linux or free-software communities, the
authors' insight into the conditions necessary for creative work is
acute and worthwhile for anyone attempting to import some of the
bazaar model's virtues into a more commercial context.

Finally, I must admit that I very nearly called this paper "The
Cathedral and the Agora", the latter term being the Greek for an open
market or public meeting place.  The seminal "agoric systems" papers
by Mark Miller and Eric Drexler, by describing the emergent 
properties
of market-like computational ecologies, helped prepare me to think
clearly about analogous phenomena in the free-software culture when
Linux rubbed my nose in them five years later.  These papers are
available on the Web at "http://www.agorics.com/agorpapers.html".

====================================================13. Version and Change History:
====================================================
$Id: cathedral-paper.sgml,v 1.29 1998/01/30 22:29:31 esr Exp $
I gave 1.16 at the Linux Kongress, May 21 1997.
I added the bibliography July 7 1997 in 1.20.
I added the Perl Conference anecdote November 18 1997 in 1.27.
Other revision levels incorporate minor editorial and markup fixes.



     --- from list marxism-international-AT-lists.village.virginia.edu ---
Driftline Main Page