 |
Humanities
I'll only make some brief notes on today's sessions. Hoping for a theme for today as well, I hoped to engage that side of me that spent a decade getting a doctorate in philosophy. You know, that liberal arts urge, at a technical conference. This year has been comparatively thin on sessions that concern social dynamics, legal issues, and histories of ideas. There has been some, but I miss seeing some of the scholars of intellectual property and freedom that have been at OSCon in previous years.
I saw an entertaining talk today by Robert Lefkowitz called Open Source as Liberal Art. I had run into Robert outside his session, and also saw his keynote last night. The keynote had been largely a standup-comedy routine that compared software development methodologies to Qintilian's Institutio Oratoria (i.e. a study of rhetoric; a field whose relative bad fortunes genuinely fills me with sadness). Not because Lefkowitz mentioned it, but just as a service to readers, I highly recommend Jay Heinrich's book Thank You For Arguing, a modern primer in rhetoric. As with all jokes, Lefkowitz' was a bit serious in his keynote. Today's talk, however, leaned slightly farther towards the serious and literal side of things. Lefkowitz presented a general classical opposition between techne and praxis (funny Greek words for... well, technology and practice). The former has been the traditional domain of the working classes, with the latter reserved for intellectuals and "leaders." So goes the tradition of "liberal education," in any case. In humorous fashion, Lefkowitz presented arguments for which side of things Free Software should fall on, tracing ideas of Knuth, Sussman, Marx, Babbage and a merry bunch of other thinkers who see code as a form of literature. Well, some do and some don't, but I think we should, and I think Lefkowitz thinks we should. If anyone wants to seriously injure their brain in this direction, a ponderous and profound philosophical work on this topic is Intellectual and Manual Labor by Alfred Sohn-Rethel. I don't actually think readers will read this classic of critical theory, nor even find a copy in print or at most libraries... but, hey, I'm not getting that decade back, so I better show something for it.
While it probably does not quite count as humanities, I always love topics related to machine extraction of knowledge from free form corpora. A talk entitled Machine Learning for Knowledge Extraction from Wikipedia & Other Semantically Weak Sources by Jamie Taylor, Colin Evans, and Toby Segaran (all from Metaweb) was quite interesting. Continuing today's tongue-in-cheek theme, Jamie Taylor dressed as a pirate, and the speakers presented the project they are working on called Freebase. Mostly what the speakers presented was various technical details of their scraping, slicing, dicing, and restructuring of Wikipedia as a large source of semi-structured knowledge. The underlying idea was to extract interesting connections and categorizations from the work of Wikipedia's millions of contributors. Their work sprinkled a bit of Bayesian magic around, some text processing, some use of semi-large backend databases. It's cool, and at least worth checking out their FOSS code. Or even more cool, they provide (in about 20 GB) a relational version of their filtered and processed Wikipedia content that lends itself to querying with regular SQL, and all the clever joins and filters that allows.
Categories
: [ 2008 | education | humanities | liberal | oscon | semantics | wikipedia ]
Jul 25 2008, 03:13:40 AM EDT
Permalink
|
Project leaders
Today OSCon turned to the "regular" sessions, which is to say shorter
and far more of them. I popped in on a variety of them, some only for
parts of sessions. Unfortunately, I found some of what I noticed about
some tutorials as well: some talks seem to be pitched at a surprisingly
introductory level. Maybe I am wrong here (but I think I am not), but it
seems to me that OSCon attendees should be assumed to have a fair degree
of programming knowledge, and talks can and should adopt a fast pace
with lots of information presented. After all, attendees are free to
followup by studying websites, books, and other materials to fill in any
details they don't quite follow on first presentation. I realize that
it is partially my personal style to prefer such rapid-fire discussion,
but at the same time I think that many speakers gave talks that they had
previously developed for less sophisticated audiences. I do not want to
single out the particular talks I attended here, but it seems like
organizers could give presenters better guidance on what to expect in
audiences.
One talk I attended that was definitely not slowed down was
Doug Judd' talk on Hypertable:
An Open Source, High Performance, Scalable Database. This, like many
open source projects, is still in alpha; it looks promising though. The
idea is to create something much like Google's (inhouse and proprietary)
"Bigtable". That is, Hypertable is a very large, high performance,
fault-tolerant, distributed database. Well, it can be very large,
the code can run on smaller clusters (or even single machines), but
there would be little point in that. What Hypertable is not is a relational
database; instead it is simply a mapping of keys to values (a
distributed dictionary). For what it is worth (not much), Judd was a
delightfully dry speaker, with the presentation being mostly simply an
enumeration of high level facts about the design of Hypertable. Content
beats showmanship in my mind.
Hypertable has several layers to its operation. Master Servers
oversee Range Servers, with each Range Server providing the data for a
particular request. The Master Server knows which Range Server to ask
for the data, using some clever distribution algorithms. Behind the
Range Servers is a Distributed File System, which lives on still other
machines (potentially). Redundancy exists at the several levels, for
example, if a Range Server fails, responsibilities are dynamically
reassigned to the Range Servers that survive. A key in a Hyptertable has
the form (row,column-family,column-qualifier,timestamp).
Data is not deleted from tables, but rather delete records are used to
update expired values. There is more to the design, of course, but it
is an interesting design that appears to provide robustness and
speed.
The main thing I did today was have some conversations with project
leaders; interviews, I suppose, but both were informal in tone, and with
interesting people. I spoke with Joe "Zonker"
Brockmeir who has been working for Novell as "openSUSE community
manager" for about six months. Zonker is an old Linux hand, having
edited Linux Magazine for a long time, as well as having written for
developerWorks and other publications. Much of my conversation with
Zonker was sharing stories about using various distros over time, and
our relationship with readerships as writers. Once we got past that, I
got around to asking him a bit about openSUSE.
One thing I felt like I had to ask Zonker about was something I think
he was very reasonably sick of answering (I didn't expect to be first):
What about Novell's patent deal with Microsoft, which left a bad taste
in the mouth of many FLOSSS advocates. Of course, this all came before
Zonker's job with Novell anyway. Still, his opinion, which seems very
reasonable, was that the deal really did not hurt the
independence of SUSE or compromise its software stack or development
model; if anything, it helped gained enterprise customers and improve
interoperability with Microsoft systems (which are sadly not going away
in corporations, even if Linux makes strides). Another topic that I
found very interesting, but is still, for now, something up in the air,
is the future governance structure of openSUSE. Zonker contrasted his
goals with those of Debian membership, which is enormously restricted,
while still recognizing that it is important to get the participation of
contributors who are genuinely involved in the direction of openSUSE
itself. At some point, probably sooner rather than later, there will be
openSUSE elections, a board, and various other procedural mechanisms to
make decisions systematically; just what that will look like has not
been decided though.
It is hard to clearly distinguish distros nowadays, I think. Zonker
stated a goal of making openSUSE the best and easiest to
install/configure Linux distro. However, I think that is necessarily a
bit generic. Yast is obviously a pretty distinctive feature of SUSE, and
many people very much like the unification of package management and
general configuration that it provides. A project that Zonker singled
out as notable is the Build Service. This
really does show a spirit of community, being a "service [to] provide
software developers with a tool to create and release open source
software for openSUSE and other Linux distributions easily on different
hardware architectures and for a broad user audience." I have not
played with the service, but it definitely looks like something to check
out.
A really interesting project is Literacy Bridge; I had the
opportunity to speak at some length with its founder Cliff
Schmidt. Their goal is to create very low cost audio
recorders/players for distribution in poor countries. The virtues of
this for educational advancement are compelling, some of them not
necessarily immediately obvious. I had the opportunity to see a
prototype unit, which is a brightly colored box, something in the style
of the similarly motivated XO
laptop from the OLPC project. However, whereas the XO costs
something a bit shy of US$200 (still very inexpensive by standard of the
developed world), Literacy Bridge's "Talking Book" is hoping to be
produced for something close to $5. This difference makes a huge
difference in the developing world, where educational budgets are
measured in tens (or at best hundreds) of dollars per pupil, per
year.
The Talking Book will have a number of interesting capabilities. It
is not that much different in principle from the digital recorders you
can buy at office stores for US$30-50; however, the differences that
exist are important. Cost is obviously central: my $35 recorder is out
of the price range for, e.g. Ghana, where the Talking Book will pilot in
September. However, of additional importance is the thought given to
navigation of the user interface. The unit has large buttons,
iconically labeled to be friendly for use by non-literate people and
children. As well, content can be navigated in a variety of ways,
configured by the content itself. For example, for some materials it
will make sense to skip between chapters; for others, such as learning
basic reading skills, skipping page-by-page is the right approach. Think
of a first "ABC book" where children (or not-yet-literate adults) sound
out a few words on a page; the Talking Book can act as a teacher
accompanying a learner sounding out letters. Much of the goal of
Literacy Bridge consists of finding sources for local materials, in
local languages, that are appropriate for the needs and goals of the
communities that use the units.
Distribution of content will take a few paths. To allow viral
distribution, each Talking Book can exchange audio files with each other
one (again, with navigation of transfers performed entirely with
language-localized audio prompts and button presses). As another
mechanism, Schmidt envisions local vendors who would set up "data
kiosks" that would allow users of Talking Book to acquire new audio at
no cost, or at worst at extremely minimal cost. The possibilities are
interesting. For example, we discussed the possibility that such kiosks
could recommend new content based on the content users currently had on
their systems; we talked about the privacy issues that could arise here,
however, so there is definitely a fine-line to walk here. Apart from
distribution of educational material per se, Talking Book has
potential for distributing informational content. It doesn't do
much good to send written schedules, instructions, and documents to
non-literate people; but those same people can listen to audio
describing the same things.
The interface for Talking Book actually opens up a variety of
possibilities beyond traditional audio-books. I proposed, perhaps
whimsically, that the same interface would allow users to play text
adventures, in the style of the old terminal games that us older
developers remember. I do not think games like that are
important, but thinking about the different ways that one might
interact with segments of audio is interesting. Schmidt himself said
that one of the great advantages of making the code for the Talking Book
open is that it can allow innovations that he has not anticipated in the
uses the device can be put to. I also found it interesting that the
interface has a lot in common with those provided to blind readers, for
example by the non-profit organization Recording for the Blind & Dyslexic
(RFB&D).
All of this leads to a plug. Literacy Bridge needs some embedded
programmers to finish up the code that will go in the pilot version of
Talking Book. Volunteering would
be a great opportunity to help out some nations and communities whose
lives would be greatly improved by improved literacy and greater access
to information. You can do this by writing Free Software!
Categories
: [ 2008 | bridge | linux | literacy | opensuse | oscon ]
Jul 24 2008, 01:22:01 AM EDT
Permalink
|
Parallel day
The lineup allowed it, so I decided today I would devote myself to
understanding programming models for concurrency. It may or may not have
any relevance, but the gig I am currently working is at an amazingly
cool company (DEShaw Research)
that is building the world's fastest supercomputer for computational
biochemistry, called Anton. This isn't about a few percentages here
and there either: in the application-specific domain of molecular
dynamics, we are going to beat Blue Gene by more than two orders
of magnitude (sorry IBM). The trick
here is that DEShaw Research is using highly specialized chips for a
problem that is very specific, is much more bandwidth constrained than
computationally (but still a heck of a lot of computation), and utilizes
shockingly little on-chip memory (which is what general-purpose CPUs use
most of their silicon for). Being as I'm working to develop a hugely
parallel system--albeit as a tiny contributor among far brighter people
than me--I figured it would be interesting to see what sort of problems
of parallelism other folks face
I attended some fascinating tutorials today, none of which had the
slightest relevance to my abovementioned contract. An interesting
feature of Anton is that its developers basically designed the silicon
(and the channels, topology, etc) around the best decomposition of the
specific problem at hand. In contrast, when developers working on
general-purpose, commodity hardware think about decomposing and
parallelizing problems, the issues are ones of matching the algorithms
to the hardware, whether distributing work over multiple cores, multiple
chips in a system, or across multiple boxes. The talks I had the chance
to hear were Arch Robison and Robert Reed of Intel, speaking about Ubiquitous
Multithreading for a Multicore World and Steven Parkes about An
Introduction to Actors for Performance, Scalability, and
Resilience.
Actually, I attended part of another session: Francesco Cesarini gave
an introductory tutorial on Erlang. Unfortunately, I found Cesarini's
talk to be far too slowly paced for me. I do not know Erlang (though I
know a little about it), but I am somewhat familiar with concepts
in functional programming and general CS ideas (as readers of
Charming Python and other writing I have done for developerWorks
will know). However, I unfortunately got the sense that his tutorial was
aimed at a much more introductory level than the bulk of conference and
tutorial attendees, almost all of whom I presume have a decent amount of
programming background. It seemed like a course that would be great for
novice programmers, but was probably not aimed right for its actual
audience.
Intel's Threading Building Blocks (TBB) is an interesting GPL'd
project. I had the chance to hear the keynote announcing its initial
open source release at last year's OSCon. The idea of TBB is to abstract
the work of thread coordination into high-level primitives, to let C++
programmers think about what their algorithms and tasks are,
rather than worry about low-level details of acquiring and releasing
locks, contention for cache lines, and determing optimal balance between
task decomposition, bandwidth constraints, and cache freshness. The
moral to keep in mind for the last balancing act is that too much
parallelism hurts you as much as not enough: if a processor cache
becomes stale because a core is juggling too many threads, the
performance hit is awful; moreover, assigning tasks among cores on an ad
hoc basis creates optimal load balancing, but typically at the price of
wasted bandwidth. Robison and Reed discussed much of what went into the
tradeoffs and design of TBB, which was fascinating to follow. I do not
think I can explain these details sufficiently accurately, but the TBB
documentation is a good place to start to learn more.
Without going into the design subtleties, the general style of TBB is
worth describing. It consists of generous use of C++ templates and some
custom functions (also some custom debugging tools). The capabilities are
generally broken down between concurrency-friendly data structures (STL
is not thread safe) and high-level parallel constructs.
Containers in TBB include things like concurrent_vector and
concurrent_hash_map. Functions include
parallel_for and parallel_reduce. Use of the
former is worth showing by brief example (taken from the TBB
tutorial):
/*****************************************************************
* INSTEAD OF:
* void SerialApplyFoo( float a[], size_t n ) {
* for( size_t i=0; i<n; ++i )
* Foo(a[i]); }
*****************************************************************/
#include "tbb/parallel_for.h"
void ParallelApplyFoo( float a[], size_t n ) {
parallel_for(blocked_range<size_t>(0,n,IdealGrainSize), ApplyFoo(a) );
}
Read the tutorial for the real details, it is enough to understand
here that all the niggly details of how numbers of cores, size of
threads, shared-memory access, distribution of threads, etc. happens "by
magic" in the TBB library. Pretty cool.
If the Intel presentation was challenging, Steve Parkes' talk was
truly brain-melting. He advocates for the Actor Model of
programming. In particular, he has both done work in Erlang, which
naturally lends itself to this approach, and has created his own
library/framework called Dramatis,
implemented in both Ruby and Python. Dramatis itself is still in alpha,
but is a way to explore concepts in programming with Actors.
Parkes broader conceptual argument is that threads are a terrible way
of handling concurrency, and Actors are an elegant way. It's a sensible
enough argument, actually. In fact, this is rather strongly witnessed by
the notable success of Erlang
(for Ericsson's telecom infrastructure) in implementing the most
reliable switches in the world, ones that simply do not have the option
of failing or rebooting. I am fairly convinced by the strength of
Erlang/Actor "shared
nothing architecture". It seems both elegant and approachable. I
guess the only problem that really jumps out at me is that in exchange
for reliability and ease of use, you simply have to bite-the-bullet of
lots of data copying. For problems that are "embarrassingly
parallel", or even merely coarse grained, bandwidth doesn't
necessarily matter. If computation dominates (as it sometimes does) an
Actor model seems great. However, if you really cannot afford to send
data over far, slow channels, sometimes shared-memory access (with all
the pains of locking, race-conditions, and all the rest) is the better
bullet to bite.
Categories
: [ 2008 | concurrency | dramatis | erlang | intel | multicore | oscon | threading ]
Jul 23 2008, 01:11:02 AM EDT
Permalink
|
First day tutorials
The first couple days of OSCon are the tutorial sessions. These have a
bit different feel than the standard talks: if nothing else, they are
much longer at 3 1/2 hours, where the later talks will be 1 hour each.
This much time lets you get a pretty good initial sense of a particular
technology. At the same time, a reporter cannot make it to as many
sessions when each one is half the day.
In the morning I jotted between several sessions to try to get a feel
for each. Steve Holden, Chairman of
the Board of the Python Software
Foundation (to which I was recently elected as a plain, lowly member), all around good guy, and Python advocate, gave an introduction to Python intended for people with some programming background, but not necessarily ever a line of Python. It seems like a daunting task. This session was packed; it's a good sign, I think and hope, of the growing popularity of Python.
Other than noting that Steve did a good job, there is not
too much new to mention about the session; it did what a first three
hours of Python training should do. I did notice that we are starting
to get close to that slightly uncomfortable point where deciding between
Python 2.x and 3.0 becomes a real issue. 3.0 is still in beta stages,
but soon it will be out, and it just might be time to start training new
users in 3.0 rather than 2.6 around now. Steve's session, however, was
definitely a 2.x tutorial... with just some discrete mentions of
upcoming changes. Of course, I am partially guessing on what was
covered, since I split my time within the three hour block.
This leads me into another curiosity of the day. I saw both Damian Conway and
Randal
Schwartz speak as well, and neither had anything to say about Perl.
Ok, I exaggerated ever so slightly: neither could completely resist Perl
mentions, but that seems fair enough. Conway spoke during the same
morning slot as Holden, but about Vim script. Vim script is an
interesting creature, about which I knew shockingly little, especially
given that I do mostly use Vim when I shell into SSH sessions on
various machines. Without getting into a religious war with Emacs users
on this (I'm pretty agnostic myself), I just find that I am less likely
to be confused about the Vim that is installed at some location than for
other powerful editors. Emacs users seem to have their own complex and
unique configurations that they take with them wherever they go. Now
that I have attended Conway's session, however, I think I may chase down
that same rabbit hole with Vim customizations. My mind races with
dozens of new helper functions that it would be ever-so-convenient to
have in my environment.
Having only popped in and out of the session, I am certainly no
expert at Vim script; barely a novice even. I was impressed, however,
at just how much slicing and dicing it really is easy to do with Vim
functions (not only with sed/awk/grep-like syntax rolled into daunting
single commands). Many readers will know more than I did about the
pretty nice list manipulation capabilities, the filtering and mapping
constructs (almost functional programming!), parameter parsing in
functions, and a good variety of string processing functionality. I am
not surprised, just embarrased how slow I have been climbing that
learning curve. Like some other things, however, Conway made a big
point of observing just how very much non-orthogonality has grown into
Vim script, with multiple ways to do the same simple-seeming actions,
and pitfalls for almost-but-not-quite the same variations. Hmmm... do
any readers know of other languages that suffer that danger?
During the morning, I also hopped into a talk called "GIMP
demystified" by Akkana Peck. I only heard a bit of it, but Peck did a
good job as well. Neither she nor Conway's talks attracted nearly the
crowds of Holden's Python intro though. I do not do much graphics
manipulation, but it was helpful to see just how sophisticated GIMP's
masking and layers really are.
During the afternoon, I heard Randal Schwartz talk about the Seaside web
framework. I confess that this one was my second choice, since a
cancelled talk was to have covered "Advance Metaprogramming in Ruby".
That brain-melting stuff gets my heart all a flitter (I have long wanted
to write a book Metaprogramming in Python, but never really felt
there was a market to support it). It was not to be today though.
Schwartz' talk was quite interesting, as is the Seaside framework.
The most notable thing about Seaside is that it is written in Smalltalk;
that and the fact that it is actually pretty widely used (according to
Schwartz) by big corporate sites. Schwartz' talk spent the first half
as a general introduction to Smalltalk: a cool language that not enough
programmers have ever played with. I am very ambivalent about the
imposed GUI/development environment/interpreter that Smalltalk (Squeak
specifically in the tutorial) imposes on you. Smalltalk code focuses on
ultra-short methods that really require the Smalltalk environment to
browse and develop. I still feel a purity in separating my text editor
from anything about the underlying language mechanisms.
Smalltalk/Squeak is as far from that as it is possible to get: think
Eclipse on steroids, and without any close button.
The framework itself is based on continuations and maintenance of
session state. As with other rapid-development web frameworks, a very
small amount of template code does a lot of work. In contrast to, say
Ruby or Django, Seaside gives you far more capability of poking inside
the running sessions, and even greater dynamism in seeing your code
changes reflected on web pages. As a development feedback process, this
is great. You can build your applications in baby steps, seeing the
changes and improvements at each one, and getting immediate notification
and feedback on bugs and glitches. Another respect in which Seaside is
unusual as a web framework is that its "templating" language is just
Smalltalk itself. While somewhat novel as a concept, in practice it
seems little different: it is really just a matter of sticking some
method names where you might put various meta characters and escape
sequences in other frameworks. In the end, I am not really that
impressed that the framework knows how to escape and unescape ampersands
in values "behind the scenes." That part of Schwartz' presentation was
a little bit "been there, done that." On the whole, however, I think
Seaside is worth checking out, and has some definite and interesting
advantages over other popular web frameworks.
Categories
: [ 2008 | Conway | Holden | OSCon | Python | Schwartz | Seaside | Smalltalk | Vim ]
Jul 22 2008, 03:09:18 AM EDT
Permalink
|
Arriving and plans
I confess that I approach my interests in a somewhat whimsical fashion, usually letting matters of interest come into my scope of view rather than seek them out. There is so much interesting going on at OSCon that I figured this is a perfectly good approach.
All of that is a long-winded way of saying that once I got my press credentials approved, a variety of companies and organizations contacted me to propose interviews; some of more and some of less personal interest. Novell was particularly "pro-active" in actually following up with a telephone call, which I'll take as indicating their commitment to reaching FLOSS communities; I'll be speaking with Joe "Zonker" Brockmeier who has joined Novell as openSUSE Community Manager (http://zonker.opensuse.org/about/). I know Brockmeier by reputation, but have not had the opportunity to speak to him in the past. I look forward to this conversation on Wednesday.
Another conversation I am also looking forward to on Wednesday is with
Cliff Schmidt of Literacy Bridge (http://literacybridge.org/about/boardofdirectors.html ). I hadn't known anything about them, but they seem to have a really worthwhile project for increasing educational opportunities in poor nations. Basically, it looks like their work dovetails with the good work the OLPC project has done. Literacy Bridge has the goal of providing low-cost players for audio books and other learning materials that will have a variety of cool features such as peer-to-peer content exchange, interaction with XO laptops, and provision of localized content. I'll ask Schmidt about more details, of course, but the focus of this work seems to be more about "free information" than about "free software" per se. Both are worthy purposes.
Categories
: [ 2008 | interviews | linux | literacy | novell | olpc | oscon | suse ]
Jul 21 2008, 08:15:00 PM EDT
Permalink
|
Coverage of OSCon 2008
I am returning as a reporter to OSCon 2008. Some of you may dimly remember my coverage of OSCon 2006; last year I did not report on the wonderful conference, but did have the opportunity to attend and present a paper there. OSCon definitely is the place to capture the "buzz" of the open source world. I'll add more details on sessions I plan to report on and/or people I'll interview as I learn more.
Jul 02 2008, 03:30:00 PM EDT
Permalink
|
Interview with Josh Berkus (OSCon 2006)
Josh Berkus is another interesting fellow I had a chance to talk with at some length. Sun really made an effort, it seems, to get folks in the public eye. A lot of the vendors sent me solicitations to check out their booths, usually with blurbs about their products in the press releases. But Sun made the extra effort to schedule interviews between press members and Sun employees. I see that not only because I had such scheduled interviews with Josh Berkus and Tim Bray, but also because some other folks out there in the press (or blogosphere, if that is a real word) have also posted comments from such interviews. For what it is worth, Simon Phipps is another prominent Sun employee who was slated in the interview track—I did not talk to him personally, but I did attend his talk in conjunction with Tom Marble (I might come back to that in another entry).
Let me get the last part of my talk with Josh out of the way first. The reason Sun was putting him forward, was almost certainly to answer the question I asked him towards the end of our talk (or something closely along the same lines), namely: Why is Sun a good fit for maintenance and development of PostgreSQL (for those not in-the-know, Josh has been one of the main developers of PostgreSQL for four years). The sort of vague answer is about the stability and scalability of Solaris and Sun hardware. True enough, but I think slightly at the level of nicety. Of more substance to my mind was Josh's specific statement on the benefits of the ZFS filesystem. In particular, ZFS allows dynamic use of multiple physical volumes, with a volume manager controlling virtual storage pools. Just what you want for growing databases.
What Josh and I talked about in more detail is probably idiosyncratic to my interview with him. Although I had not spoken with him before directly, Josh has also worked with the Open Voting Consortium that I am CTO of, and roughly in affiliation with I gave my paper. It was interesting to get Josh's perspective on these issues, and he is someone quite knowledgeable in this. Clearly, in whatever area he enters, Josh does his homework. Last year, Josh testified before CA legislature on FOSS in relation to voting systems, during a hearing considering legislation to mandate such use. Well, really the hearing followed up on the non-binding CA HR 242 that stated a preference for such systems, instructed the California Secretary of State to conduct hearings on the matter. The SoS wound up stonewalling on hearings, but the California Senate picked up on the gap. Initially, OVC had asked Brian Belendorf to testify; but when Brian was unavailable, he recommended Josh. Lots of background that I just happen to know, but readers need not necessarily follow.
In our interview, Josh expressed some alarm at the conflict of interest that paid lobbyist who get money from proprietary vendors, but work in
elections, have. Some of them testified in the same committee. Josh was proud of a coup he accomplished in having on hand, during his testimony, a large list of FOSS vendors (in CA), in refutation of claims by the proprietary software lobby that no such companies existed. In a nicely strident statement, Josh observed that the main "trade secret" of current vendors of election systems is just how bad their source code is. But in a more abstract tone he emphasises the "many eyes" needed to make sure bugs/backdoors are caught; he believes, as I do, that it is not sufficient simply to reveal code in limited contexts. Concretely, as soon as "code auditors" who have signed NDAs start finding bugs in proprietary system that they have been assigned to examine, they (the auditors) find themselves in court, with lawsuits from vendors. Probably these are non-meritorious SLAPP actions; but how many programmers can afford lawyers to aid the public good?
One interesting claim Josh made was that in Canada and the UK, opponents of computerized voters feared that FOSS voting systems would legitimize such systems, despite their technical lack of readiness. This is certainly an interesting inversion of "FOSS-poison" attitude in the USA. That is, here in the USA, a popular equation is of FOSS systems with vulnerabilities (and equation promoted by FUD and lobbying in my opinion, and I am sure in Josh's). Josh made an interesting point that one computer security expert who made the claim about the non-readiness of FOSS systems was overly pessimistic about computer security, and overly optimistic about non-computer security. I think there is a nice point there: while computer systems have vulnerabilities, that does not mean that non-computerized systems are necessarily safe. At least as a general rule: I think the safeguards of the Australian ballot and padlocks on ballot boxes is relatively well-understood, after 150 years.
I also attended one of the three sessions Josh gave (a busy guy), the one on FOSS press relations. He did a nice job with this as well. Probably a failing of many FOSS projects is not knowing exactly how to deal with the broader media, and how to formulate and time good press releases. Certainly these concerns are big for big and widely-used projects like PostgreSQL. Many perfectly usable and useful smaller projects (like, say, my own little Gnosis Utilities are actually probably fine with a sort of "let the release go out quietly" approach... some tools are meant for a narrow and technical audience who already know where to look. PostgreSQL is one of those tools used by millions, including by many big companies and organizations. For something like that, FOSS should show the same savvy (or better) as that big proprietary software vendors with PR departments have. In a lot of ways, FOSS projects can and do achieve better media relations than the unfree guys.
Jul 30 2006, 11:45:44 PM EDT
Permalink
|
A quick note on Tim Bray and Atom (OSCon 2006)
One of the topics I interviewed Tim Bray about was the use of a globally unique identifier in Atom feeds. Basically, each Atom entry is required to have a name distinct from the name of any other entry in the world. However, the Atom standard (RFC4287) does not require a particular rule for assigning these identifiers. Obvious options one might use are UUIDs (RFC4122) or URIs (RRC3986). I suggested to Tim as well that something sensible might be an identifier that somehow hashes the content of the entry itself, hence providing a certain kind of integrity constraint.
My concern here is twofold. Basically there are a couple ways that non-unique identifiers might arise. One is that someone is going to write a bad Atom Publishing Protocol server that either assigns the same ID to multiple entries it holds, or where multiple installations of the same server fail to find appropriate unique components (e.g. a default prefix that is not site-configured). In response to this, Tim suggested it would happen less than I think, simply because it is pretty easy to get either URIs or UUIDs right. Fair enough.
The more interesting problem is where people maliciously duplicate IDs, either to spoof entries, or to perform insertion attacks, or otherwise to disrupt the use of Atom (or disrupt particular producers of feeds). In support of my point, Tim noted that soon there will be feeds with substantial financial value, such as credit card transactions. At the same time, he made a point of the fact that Atom does not make anything worse in comparison with existing RSS feeds: in his example, if e.g. Technorati decided to become malicious, they could perfectly easily put words in his mouth.
Part of Tim's attitude reflects what I noted before about his commitment to practicality over purity. He comments that he saw much of this as social problems not technology problem. A nice quote from his comments is: "In general it's a good thing to name things using URIs; and in general it's good not to micro-manage how people use URIs." That has a nice sound to it. In fairness, I am sure that Tim does not fail to recognize that there is a technology component to security layers, authentication mechanisms, and so on... he just sees these questions as lying outside the concerns Atom itself addresses (and are reasonably described as "social").
Still, the issue of security attacks involving identifier falsification or spoofing intersts me. Hopefully I will have a chance to write about this someday soon, in more detail (and once I have thought through the specific threat models).
Jul 30 2006, 01:10:23 PM EDT
Permalink
|
More on Open Source Voting presentation
In my initial entry, I mentioned in a general way my enthusiasm about my Open Source Voting presentation. But really, I did not say very much about its content. In part I was waiting to be able to provide relevant links for readers. I believe our slides will soon be available via the OSCon 2006 website, but the below resources are available now:
My hope is that readers of this blog will decide to read some of those fuller papers, which generally reflect what I presented at OSCon. The presentation was something of a combination of the ideas in several papers, but informally structured. In fact, despite the fact there are only 14 slides in the whole show (including one that contains just the name of the paper and its authors), I really only discussed about half the slides during the lively discussion.
One issue I did highlight in my talk is something that is not really emphasized in any of the papers, just implied. But this point is of growing importance in my mind, and also ties in especially well with the OSCon context. The idea is that issues about covert channels mean that FOSS is required for rigorous mathematical reasons, not simply out of general political desirability, or because of the positive "many eyes" effects that FOSS promotes. Sure, for me the first principle is that the technical mechanisms of elections should be disclosed to voters for the same fundamental democratic reasons that so-called Sunshine Laws reveal the workings of governance. However, even for readers (or audience members) who do not share my political sentiment, there is some basic mathematics to consider.
One of the principle considerations in designing voting systems is that it is important not to disclose the identity of voters. A vote does not simply need to be recorded accurately and reliably, it also needs to be recorded anonymously. Apart from the specifics of the OVC design, a voting system contains a variety of channels for transmission of information: some might be electronic, XML files and whatnot; others are simply pieces of paper that get moved around according to various rules and patterns (paper is an excellent steganographic medium). The plain fact is that very few channels are at their Shannon limit; and what that tends to mean—almost always—is that multiple concrete encodings can represent the same semantic content. For example, an XML file can have slightly different forms that are reduced to a common meaning via whitespace normalization. Or a computer-printed paper ballot can have a pixel here and there that does not effect which vote is cast (for example, subtly moved around in an identificatory watermark; or even effects that superficially look like printer artifacts).
The problem is that even a fully open and disclosed data API leaves this sort of wiggle room to hide some bits and bytes in a covert channel. Maybe that extra space character is an accident of how the outputter is coded; or maybe it is put there to deliberately leak information about voter identity (once the "black hats" know where to look). Any closed source implementation—even one produced by (counterfactual) vendors that we fully trust and who have shown a good prior record of best security practices (both very much contrary to the status of existing voting system vendors)—can fully conform with an open standards data API, while still containing a covert channel. An open source implementation however, can be checked at the code level to make sure no such covert channel is encoded... and the proof of its operation is that all the channels contain exactly those bits that the open source should produce. That is, if someone were to substitute malicious source for the examined source, say during the installation or distribution process, that malicious code would have to produce slightly different bits if it were to produce a covert channel.
So there we have it: closed source cannot, in principle, guard against this significant attack. Open source is required as a simple question of mathematics.
Jul 30 2006, 12:00:58 PM EDT
Permalink
|
Second day: Software Libre
I visited a couple sessions that got at general notions of FOSS as
acting in the service of political freedoms. In my mind, this ties
fairly closely to the licensing issues I chatted about earlier.
A really fascinating and, to my mind very optimistic, talk was on FOSS
in Venezuela. The movement towards FOSS has been quite strong in South
America; in the Venezuelan context, two of the speakers were active
members in an organization called SoLVe (Software Libre Venezuela), and
organized a conference similar to OSCon under its aegis.
Jeff
Zucker who has worked with UNICEF and UNESCO on software issues
introduced the main speakers.
Alejandro
Imass gave a perfectly reasonable talk on developing FOSS ERM systems.
I confess that the topic seemed slightly dry to me; worthwhile, but it
did not grab me from either the political or technical/theoretical
point-of-view. He emphasized some good principles of component
architectures and loose connections between related systems, but that is
relatively common to good design principles.
Lino
Ramirez, on the other hand was quite fiery, or at least of great
interest to me personally. He provided some background on Venezuela's
FOSS bill, which has undergone an interesting process of democratic
input from ordinary citizens, per some reformed mandates for
participatory democracy in Venezuela. Ramirez also compared this bill
(following on a presidential directive to similar effect, but the
directive is less fixed than a law would be) to similar prior efforts in
Brazil and Peru. In both of those cases, quite good bills were derailed
by intensive lobbying by Microsoft, who is also running a massive
campaign against the Venezuelan legislation.
Apart from the specific outcome of this bill, Venezuela has implemented
a number of technical outreach programs for poor and indiginous peoples.
These include installation of FOSS software in schools and special
training centers in remote locations. Many towns and villages have
gained computer centers where locals can learn computer skills and
access the internet; all of this would have been impossible without
FOSS. A nice case in point was the creation of a linux distribution in
the native
Wayúu
language. Having tools like OpenOffice.org in small-group languages like
Wayúu aids in preserving the cultural heritage of such languages and
peoples. A really nice upshot of this was shown in the question period,
where one of the leading OpenOffice.org evangelists first learned of the
translation at this session... and the interchange will presumably lead
to good promotion and advertisement for both OpenOffice.org (which is
accessible to more native peoples than closed source software ever can
or will be), and to SoLVe's leadership in education and cultural
preservation efforts, in the developing world.
I also attended a session by
Karl Fogel
on early the history of copyright. This talk was interesting, but I
guess familiar enough to me. After the development of the printing press
in Europe (or really, of course, its transplantion from China),
governments like the British Crown granted monopoly control of printing
press technology to a limited guild of printers. Rationalizations of the
"moral rights of authors" grew out of the base reality that publishers
want the state to subsidize their profits... with authors having never
played much of a role in any of this. None of that was really surprising
to me; even if I had not specifically known it, I would have predicted
as much from my knowledge of social and economic history... a Ph.D. in
political philosophy, like I have, actually wins you some decent
insights into how politics and economics actually work. Still, I am
certain the lesson was valuable for many listeners, and the analogies
with current issues around blogs, filesharing, and FOSS are worth
drawing.
Jul 29 2006, 12:59:43 AM EDT
Permalink
|
Second day: Python 3000
One of the events I was especially looking forward to was Guido van
Rossum's talk on what is coming up in Python 3.0. In truth, I knew
there would not be anything in the talk that has not been discussed in
more detail on the Python development lists, or that at least would be
discussed there soon enough. Nonetheless, hearing the announcement from the
BDFL himself carried a certain mystique.
Unfortunately, Guido was developing a cold, or at least a cough, right
about when he had to give his talk. So he had a trouble speaking
without hoarseness. The presentation was still interesting: as the
audience almost certainly hoped, he made a mildly comical disparagement
of the Perl 6 process by way of comparison—but strictly in the
friendliest manner, obviously without any hostility or competitive
sentiment towards the Perl coders. His comment though was that the Perl
6 methodology appeared to be for a group of developers to travel to a
distant island, and remain there until they invented a new programming
language. In a somewhat more serious tone, he also contrasted Python 3.0
with C++, where the latter is completely unwilling to accept even the
smallest backwards-compatibility breakage. Guido described Python 3.0
as falling in the middle of these extremes.
Moreover, our BDFL announced a pretty concrete schedule for 3.0: An
alpha should be available near the beginning of 2007, with a release
version before the end of the year. Python 2.6 will almost surely be
released before the final 3.0, and the Python 2.x line will continue for a good
while to overlap 3.0 (because 3.0 will not run all the older Python
programs unmodified). Python 2.7 will probably contain some back-ports
of 3.0 features, where they can be implemented without breakage; and 2.7
will also probably contain a collection of migration tools. Guido
envisions migration as relying on two classes of tools:
- Code analyzers along the lines of PyChecker and PyLint that can in many cases extract the intent of code, vis-a-vis the specific types of objects being handled. Most breakage will come about because particular types (think collections) behave somewhat differently than they used to. Guido gave the example of trying to determine whether
f(x.keys()) represents code breakage. There are two points of concern here:
- Is
x.keys() really a call to a method of a dictionary(-like) object, as you would tend to think?
- As mentioned below, this call on a dictionary will start returning either an iterator or a view in Python 3.0, rather than a fixed list. Depending on what you do with it, the change may or may not matter to the code in
f(). I.e. if you just do "for thing in
keys:", all is happy; if you mutate the expected list, problems occur. The exact fix is not generally automatable, since developers can reasonably want different behaviors in response to the change.
- Warnings about likely changes. Presumably with 2.7 (and later 2.x versions), there will be a means of warning developers of constructs that are likely to cause porting issues. In the simplest case, this will include deprecated functions and syntax constructs. But presumably
the warnings may cover "potential problems" like the above example.
So what is going to be new? And what is going to be removed? Removal is
interesting. Some basic redundancy like dct.has_key(x) is
going away, since nowadays you write if x in dct anyway. A
few other relatively painless things along the same lines happen also. But more
interesting is the fact that lambda is not going anywere
(it is also not being enhanced according to any of the numerous
proposals). This little fact met with a surprising number of cheers (and
probably some less audible rolled eyes among a different subset of the
audience). Old style classes also go away, to everyone's approval; that
is not 100% breakage free, but it is just simply a good thing.
Similarly with the removal of string exceptions, and the creation of a
BaseException ancestor of all exceptions. A little bit of
syntax is simplified too. I will lose my dear <> version
of inequality, but that is an awfully easy update.
Some new feature include:
- All strings become Unicode (breaky), and a new
bytes type lets you encode mutable arrays of 8-bit bytes. Basically, one is "text" the other is "binary data". Accompanying this will probably be a variety of mechanisms to make I/O methods inherently handle Unicode, transparently deal with decoding on open(fname) and the
like (and also things like seeks).
- Inequality comparisons become even more breaky than they have been (see my recent Charming Python bemoaning inequalities). I have mixed feelings myself, but in a certain way I think it is a reasonable approach. Python will give up (most of) its willingness to guess
about what coders intend when comparing unlike types of things. At
least that adds consistency. Rather than sometimes-but-who-knows-when
having sorts break, we can just assume they do not work unless
collections are homogeneous, or unless heroic measures are taken in
advance (but as a known requirement).
- As expected, the move towards iterators and variations on lazy objects continues apace. List comprehensions do not go away, but they are direct synonyms (syntax sugar) for a
list() call wrapped around a generator comphrehension. This changes the leakage of
variables to surroudning scope, which is a good thing.
There is more, some of it mildly incompatible. But overall it looks like
a very conservative revision of the Python language, and one looking
forward to the next 1000 years of Python programming (as Guido puts it).
Another thing I completely failed to notice until Paul McGavin pointed
it out to me: Guido said nary a word about optional type declarations.
Given what a hot button this idea is, the lacuna was surprising. I
would not necessarily be surprised to hear he had decided against it; but
hearing nothing at all, either way, is curious.
Jul 28 2006, 08:44:00 AM EDT
Permalink
|
Second day: Microformats
I had the opportunity to talk with Tim Bray for about a half hour this
morning, as I have mentioned I would. He is an interesting guy, and I am
going to scatter topics we spoke about over several of these entries
rather than simply report his comments verbatim and linearly.
One of the things I asked Tim about was a topic that Dethe Elza has
addressed in a recent guest column for XML Matters: microformats.
Despite the wonderful article Dethe wrote, I have a certain suspicion in
my attitude towards microformats. Specifically, they strike me as a way
to smuggle in a brand new schema definition embedded within an existing
schema (e.g. XHTML), while pretending not to need a schema. What, after
all, is so much clearer about writing <div class="vevent"> than
just writing <vevent> to start with? One might argue that the
first can render in an XHTML-compliant browser—but the latter can
equally render in common browsers as long as a CSS stylesheet is
attached that says what to do with it. Or for that matter, it doesn't
take much XSLT or AJAX to render that <vevent> element as
something nice. Even without a formal schema, the actual "vevent" tag
just seems to document itself better, and to look cleaner.
My hunch had been that Tim Bray, given his prominent role in XML
standardization, would look down on ad hoc and "impure" uses of
semantic markup that eschewed formal XML tags. It turned out I was wrong on
two counts. On the one hand, Tim rather took exception to my
description of a schema as a semantics, suggesting a schema was simply a
syntax description. I know he is right at some formal level; but I still
think a practically implemented and processed XML schema really does
represent a semantic constraint. Sure, W3C XML Schema—or RELAX NG that
both Tim and I share a strong preference for—are narrowly just
grammars. But an XML dialect, to my mind, can hardly help (especially
at its best) but wear its semantic intent on its sleeve.
In any case, that is not the main point. In the end, Tim was much less
skeptical of microformats than I remain. But oddly, it seemed to be
because Tim is much less of a puritan about the formal use of XML than I
am. His observation was that XML is just tags, and they can happily
coexist with other markup systems, and there is nothing all that
important or even necessarily long-lived about XML. All that, of course,
is true—but it embodies an attitude, I think, that takes formalism very
lightly, and as simply a practical consideration. An interesting
perspective, I believe from someone who was, after all, co-editor of the
XML and XML namespace specifications.
At the end of the day, I attended one of the "birds-of-a-feather"
sessions on microformats, led by a representative
Technorati. These sessions focus on informal
discussion rather than on formal presentation. Those folks are
enthusiasts about microformats, perhaps some of the leading proponents.
It was in interesting discussion, and one in which I initially raised my
above concern. The sentiment of the room seemed to be that "mere web
developers" could handle enhanced-XHTML (i.e. extra "class" attributes) more easily
than they could deal with the "full complexity of XML". I confess I do not
quite see that. Why is the longer version in the above example simpler?
Even if XML namespaces are used to get something "long" like
<card:vevent>?
Several attendees, however, made a rather good point about support in
existing HTML editing tools (and also the fact the "quirks" mode of
browsers do a lot to accomodate non-technical or semi-technical
producers of web pages). However, then the discussion turned to how
microformats might support versioning and backward compatibility. And
then to how microformats might be combined and embedded in each other.
Or in other words, to my mind, the room decided to reinvent XML
namespaces in a more ad hoc way. The whole thing brought me to an
idea for "Mertz' correlary to Greenspun's Tenth Rule. So rather than:
Any sufficiently complicated C or Fortran program contains
an ad hoc, informally-specified, bug-ridden, slow implementation of half
of Common Lisp.
My correlary might read:
Any sufficiently complicated XML-based ad hoc format,
eventually finds a way to embed the moral equivalent of
informally-specified, bug-ridden, and ugly XML namespaces
OK, maybe not the most mellifluous I have ever managed. But definitely
the stuff of some later installment of XML Matters.
Jul 27 2006, 11:40:00 AM EDT
Permalink
|
First day of attendance: Presentation
The highlight of today, for me, was of course my own presentation of Open Source Voting. It turned out there was a wrinkle to the matter though. My co-presenter Arthur Keller wound up missing his plane. The original plan was for him to primarily run the presentation, with me adding some particular threads more extemporaneously. Moreover, I hadn't even seen the latest version of the slides he planned to use (he has given variants on this talk several times before).
Of some redeeming value was the fact that I had the opportunity today to meet in person my past Open Voting Consortium collaborator, Fred McLain, who did a wonderful job in leading developement of OVCs demo software. We had spoken and emailed frequently in the past, but I had not seen Fred face-to-face until today. So at the last moment, with Fred's consent (and in fact, I think, his enthusiasm), I recruited Fred to join me on the session stage; and I am happy to say that he added some wonderful commentary to the session. There was also a silver lining to Arthur's travel glitch. Arthur is a fine computer scientist, and an even better proponent of fairness, integrity, and transparency in election. But my feeling is that Arthur tends towards a more formal presentation style than I would personally be inclined towards.
Being accidentally promoted to sole presenter (or at least primary, in gesture to Fred), I was able to run the session in a more extemporaneous fashion, and especially with a greater emphasis on audience particiation and questions. I must say with some pride that this session was by far the liveliest I have been to so far. The audience, on this very political topic about which many people feel very strongly, was extremely involved. A great number of them asked many well-informed questions, and carried much of the explanation forward—with just some gentle nudging and explication by me. In fact, there was enough involvement in the topic that the session ran quite a while over its allocated time, even after we invited anyone who needed to do so to feel free to leave the session. But the large majority of the audience enjoyably stayed around another half-hour beyond the scheduled 40 minutes, and most of them participated by adding good insight to both the technical and political aspects of this. It was just lucky that an accident of scheduling put this session at the end of the day, just before the conference-wide reception; and in particular, without any next panel needing the room immediately.
I'll get back to this topic tomorrow. But I wanted to put in a few words during tonight's report.
Jul 27 2006, 12:20:00 AM EDT
Permalink
|
First day of attendance: Sessions
I attended just a few sessions today, because of my late night arrival and my mini-crisis about my own presentation. But those I saw were quite lively.
I had the opportunity to meet Python luminary Raymond Hettinger. Raymond has done some of the most wonderful and mind-bending stuff in the development of Python. A lot of the stuff I contemplated in my columns about coroutines, generators, state machines, lightweight switching and data passing, and other areas, was ultimately developed into PEPs by Raymond, and then often concretely coded by him, and incorporated into the Python development tree. Or in other cases, my own articles have simply tried to keep up with Hettinger's innovations. Very briefly before Raymond's talk, I had a chance to very briefly meet Anna Ravenscroft and our own BDFL Guido van Rossum. And some other Python notables like Kevin Altis during the day.
Hettinger was an amazingly lively and charismatic presenter. He did this whirlwind presentation of idea about implementing AI in Python. The moral was, at a first pass, just how concise and readable complex ideas could be in Python. But I think the subtext to his talk was about achieving proper levels of abstraction and generality. That is something Python enables, but it is also a skill good programmers of other languages need to develop. With a focus in mind on how useful Python can be in education, and to pique the interest of young, future programmers, Raymond presented a number of game- or puzzle-solving problems, some hard, some easy, but all boiled down to a dozen or two lines of Python each.
After the talk, I spoke with Raymond some more. I think I have a bit of a scoop to reveal here. I hope I'm not breaking confidence... but then, I was wearing my press badge when we spoke. Hettinger is doing a book in conjunction with van Rossum, for Prentice-Hall called... drum roll please... The Python Programming Language. The fascinatingly original title reflects the purpose of the book. That is, it is intended as an official document of exactly what Python is. That is, not just what the current CPython implementation actually does, but what in general any implementation would need to do to be Python.
Another notable speaker I heard was Tim Bray, whom I'll be interviewing as well tomorrow morning. Tim spoke about the Atom syndication format. He seems to think it is likely to be "the next big thing", and indeed Atom clearly rationalizes the hodge-podge of RSS variants, and provides a consistent publishing and syndication format and protocol. Technically, the Atom Publishing Protocol is something a bit different from the Atom format itself. But APP really straightforwardly builds on HTTP in a rational and modular way; enough so that Bray can say with a straight face that "APP has no API".
I was rather charmed as well by Tim's vehement dismissal of the phrase "user generated content" (as if there were some other kind). For Tim, naturally, the whole point of Web-associated technologies is to let so-called users generate content. But then, this is no different from users in their more mellifluous description, "people", who have talked and written and argued and advocated long before there was a Web, or an Internet, or whatever. More with Tim tomorrow.
Jul 26 2006, 11:59:00 PM EDT
Permalink
|
First day of attendance: Exhibition
It has been an eventful day. Perhaps most eventful of all because the scheduled co-presenter in my own panel wound up missing his airplane, and I presented individually instead. Plus my own rather ordinary adventures with my already late-night flight of last night being even further delayed. Still, anyone who travels has more interesting stories than mine. More on my own Open Source Voting panel in a later post.
At the beginning of the day, I talked with a variety of exhibitors. A lot of the hardware companies had booths—Intel, AMD, Dell. While I certainly understand that FOSS runs on machines from these hardware producers, their exhibits seemed somehow rather non-specific to this conference. Another line of exhibitors was the traditional well-know FOSS products: Zend, Mozilla, Novell, and some I know less well like Splunk, Simula, Scalix. Plus a number of book publishers. Sadly not my own Addison-Wesley, though they had the slightly oddball TeX User Group (TeX is a fine tool, but they mostly just had Knuth's books to sell; excellent books, but not exactly the stuff of evangelism). Of course, our own dear IBM spans all these spaces, but oddly, IBM did not maintain a booth at exhibition. My list above is not complete, just a sample of booths I noticed.
What piqued my interest to the greatest degress, however, was the several organization I might characterize as "copyright/license interests". The Apache Software Foundation had a booth, so did the Electronic Frontier Foundation, the Free Software Foundation; to an extent the Perl Foundation fits in this line, though I did not speak with them. There are several talks coming up related to copyright, either broadly or narrowly. If I have a chance to attend those, I'll expand on my thoughts here.
In a general sense, I was very interested to hear what representatives of various approaches to licensing of FOSS had to say about the benefits of their specific licenses and maintaining organization. I probably went into the most detail with Jim Jagielski (wearing his ASF hat for this conversation). We talked about the ASF's goal of expanding the projects managed under the ASF aegis, while still striving for some coherence in the tool set among ASF's now extremely numerous projects (something like 200 separate Apache projects to my reckoning). I spoke with a few other conference attendees about Apache's efforts, such as the integration of Jakarta, Lucene, and other code bases such as the Incubator projects. Sentiments were mixed, with some natural disquiet about the rough edges and deep dependency trees. But overall, most developers seem to feel ASF has done a pretty good job of merging and integrating related projects wherever possible. Maven, Apache's dependency maintenance tool was considered either a great simplifier, or a crutch to get by with excessive dependencies, depending on whom you talk to.
Back to Jim Jagielski and ASF's attitudes about licensing policy. At first brush, ASF seems particularly warry of dual licensing approaches, and really values putting all their projects under the same ASF license. Pushed a little, Jim admitted to possibilities of code being maintained (i.e. dual licensed, though he did not exactly want to put it this way), if it started out as Public Domain or BSD-family license. Without quite saying it this way, ASF's rather reasonable concern is to make sure code comes from Apache-compatible code bases. At heart, this is in conflict with the more restrictive GPL and LGPL approaches, though it also seemed to weigh against inclusion of tools like mod_perl in the Apache family directly, despite the liberality of the Artistic License (not to say they are not still happy for the capabilities of such tools, that interoperate with Apache itself, or other ASF projects; it's just a question of level of license and organizational integration).
I found the Free Software Foundation booth staff a little less on top of licensing issues than I might have hoped. Given the almost completely political mission of FSF (versus other more mixed-mission organizations), I would have thought issues like the meaning of the GPL 3 modifying the LGPL terms to be simple riders to the base GPL 3 license would be the stuff they were passionate about. But in the end, I pretty much explained these legal questions to them... and even what I perceived as Richard Stallman and Eben Moglen's motivations in the matter. Of course, this was just a few volunteers who worked a booth, obviously FSF's central staff lives and breathes this stuff.
Not quite a licensing issue exactly, I got into a really nice conversation with Dru Lavigne at the FreeBSD booth. In my mind, I confess, BSD's main difference from Linux amounts to licensing philosophies. Both are perfectly wonderful modern Unix-like OSs with good installers and the same desktops. I realize the core developers of each get passionate about the ins-and-outs of scheduler design and micro-optimization of IPC, but that's lower level than the work I tend to do.
Dru, in our conversation had some intersting ideas, and some passion, about the need for FreeBSD to develop a certification program, perhaps something like the LPI program that your reporter wrote a tutorial/training series on. I feel, despite my well-received tutorials, that ultimately certification does little to genuinely assure quality of programming staff, nor even bare knowledgeability. Cramming for a test doesn't reflect (negatively or positively) on genuine problem solving skills and flexibility in thinking, which are what you want in IT staff. Dru felt, however, that certfification programs were important for FreeBSD's wider acceptance in corporate settings; I can't say I disagree with her on this. In the end, the problems she sees are twofold: setting up certification programs tends to incur large initial costs from the OS/tool developers who fund their creation; and also the cost of test-taking tends to exclude many very fine developers in the developing world, where the US dollar cost is simply too much relative to their national income levels and currency exchange rates. I may talk with Dru further (via email or telephone) following the conference, and let readers here know more on her thoughts on this.
Jul 26 2006, 11:30:00 AM EDT
Permalink
|
|
 |
| S | M | T | W | T | F | S | | | | | | | 1 | 2 | | 3 | 4 | 5 | 6 | 7 | 8 | 9 | | 10 | 11 | 12 | 13 | 14 | 15 | 16 | | 17 | 18 | 19 | 20 | 21 | 22 | 23 | | 24 | 25 | 26 | 27 | 28 | 29 | 30 | | 31 | | | | | | | | Today |
|