Truth in Researching

Just read a great paper about programming language design: The Origins of the BitC Programming Language. BitC wants to be a verifiable systems programming language, suitable for implementing OS kernels with provable safety guarantees and competitive performance. They share the lessons they have learned and opinions they have formed. They are commited to inventing something that is actually usable. It is all refreshingly honest and free of the obligatory posturing of academics, and therefore probably unpublishable. Some juicy quotes:

The process of programming language creation is a subject of too little reflection and retrospection. Newcomers to the field (including, in some measure, us) regularly propose new languages without quite understanding what they are getting into or how big the task is. Those of us who have already crawled down the language design rat-hole rarely have time to describe either the depth of the hole or the true majesty of the rats involved.

The only substantive tool and language for building high-confidence systems is the proprietary Spark/ADA tool set, which is syntactically very restricted. Ada does not effectively exploit the advances in programming language theory that have been made over the last two decades. None of the widely available general-purpose proof systems such as Isabelle/HOL, PVS, Coq, or ACL2 had ever been connected to any programming language that might be useful for building a real, general-purpose system — or at least not publicly.

the Ada subset supported by the SPARK Examiner is unnecessarily severe in its omitted features — enough so that we plainly could not express even the carefully restricted initialization practices of the EROS or Coyotos kernels

Perversely, a strong goal for BitC was not to innovate. [This sort of “let’s do a conservative, evaluable experiment” approach makes project funding nearly impossible to obtain in academic circles, which may help to explain why computer scientists tend to step on each other’s toes rather than standing on each other’s shoulders.]

We decided very early that the ML surface syntax should not survive. It is hopelessly ambiguous, which is more or less inexcusable, but after all it’s just a yucky pragmatics issue. The beauty of the ML family really does go more than skin deep, but given the skin, it needs to.

Something we didn’t want was an object-oriented language. OO languages remain a popular fad, but our experience using C++ in the EROS system was that it actively got in the way of understanding what was going on. We also think it’s a bad sign that the C++ language evolution has adopted a policy of “extension by ad hoc complexity.” When writing the first book on reusable C++ coding in 1991, Shapiro hoped that compilers might catch up with the C++ language specification within 5 to 8 years, but every time that threatens to happen somebody moves the language. Setting aside the resulting complexity of the C++ language and continuing instability of its implementations, this doesn’t result in a stable language semantics.

Through their insistence on a complete type system, the reviewers delayed the first publication on BitC and the project as a whole by almost two years. The combination of stubborn insistence from the reviewers with stubborn persistence on the part of Sridhar did result in a better language, and did allow us to repair two fairly bad problems later almost as quickly as we discovered them.

The ML module system is fully understood only by David MacQueen, and only on alternating weeks. The Scheme module system required the combined brilliance of Matt Flatt and Matthias Felleisen (and six revisions of the language standard) to achieve. From the perspective of a mere operating systems weenie, it’s all rather discouraging, but what is the reluctant programming language designer to do?

We have not “drunk the cool-aid” concerning pure programming and multiprocessing, because the “join” operation in real multiprocessing applications is inherently stateful and serializing (therefore eager). …

Whether our surmise about multiprocessing is correct or not, the origins and focus of BitC lie in core “systems” applications such as operating systems and databases. While pure programming has a significant role in producing more robust versions of these kinds of systems, they are entirely about state, and the proposition that state can be removed from a language designed to support these systems is something like the proposition that eliminating bones is the solution to bone disease.

The real reason we stuck with the LISP-ish syntax is that we didn’t want to be fiddling with the parser while designing the language, and (to a lesser degree) because we didn’t want to encourage programmers until we thought the language was ready. We are now preparing a more sensible surface syntax, whereupon we will receive rotten egg and tomato complaints from the LISP community. You can’t please everyone. Using the LISP syntax for so long let us focus on what was really important.

It is noteworthy that none of this effort was deemed fundable by the National Science Foundation (which is to say: by the academic programming languages community). The work is clearly innovative, and it is somewhat unusual to find a project of this sort that is accompanied by a validating experiment.

2 Replies to “Truth in Researching”

  1. @Perversely, a strong goal for BitC was not to innovate. [This sort of “let’s do a conservative, evaluable experiment” approach makes project funding nearly impossible to obtain in academic circles, which may help to explain why computer scientists tend to step on each other’s toes rather than standing on each other’s shoulders.]

    Bingo. This is also why there is so little research into the actual effectiveness of proposed methods. Waterfall model? “Let’s just assume that an analogy to building construction is the perfect analogy to building software.”

    I recently read a dozen papers on code review techniques, and was stunned to see *what the authors didn’t account for*.

Comments are closed.