Why Syntax Must Die

Roly Perera offers a critique of Subtext: Why syntax won’t go away. It’s great to get this kind of feedback. I like Roly’s visionary thinking – I hope grad school doesn’t pound it out of him. Overall, we are in violent agreement on many issues.

All I am saying is, let’s give programs the same respect we give all our other information artifacts. Let’s say you are designing a system to automate a video rental store. You will analyze what information needs to be recorded, like videos and customers. You will analyze what operations need to be performed, like renting and returning. You will then design data structures that encode the information state while making it easy to specify the necessary operations. You might choose to use a relational model or an object-oriented one, but you would never decide to encode everything into text strings with embedded keywords, and use repeated occurrences of unique names to represent all pointers and relationships. Why is it that programs, the most complex information artifacts known to man, are restricted to one of the weakest of all data models? One reason is that when programming languages were first invented we didn’t know very much about data structures, and the only UI was a keypunch.

Roly and I agree that metaprogramming is central to the future of programming, and that our current adhoc refactoring techniques offer a clue to that future. That is exactly why syntax must die. Syntax makes machine manipulation of programs terribly difficult, often impossible. If we do a classical data analysis of programming as an application domain, we must surely treat refactorings as primary operations that should be straightforward to represent in the data model of a program. I think that is what Roly means when he says that refactorings should be as “formally reliable as the kind of batch-mode transformations that a … optimising compiler might implement”. The mother of all refactorings, Rename, is undecidable in a language with dynamic binding or reflection. Reductio ad absurdum. If Rename is undecidable, you are using the wrong model of programs. Subtext makes Rename trivial.

Roly rightly objects to the lack of a concise formal definition of most Visual Programming Languages, which impedes metaprogramming. Subtext does have a formal model of programs underlying it, and making that model as simple and as clean as possible has been a major goal. Unfortunately the model is still in a formative state, rapidly shifting with each iteration. My first attempt to present the key ideas of the model was rejected by OOPSLA last year, so I published it as a tech report.

Roly suggests that the textual comments attached to explicit linkages within Subtext might emerge as names in disguise, through the use of textual conventions to encode structure outside the program. I hope that Subtext will have sufficiently expressive annotation and reflection features to make that unnecessary.

Roly and I are largely in agreement if you take his use of the word ‘syntax’ to mean any constructive mathematical structure. I take syntax to mean only the use of a grammar to specify a formal language. Syntax is fundamentally about parsing strings. Syntax is a great way to represent trees. When you need a more complex topology you fall back on the universal hack of names. We invent subtle and mysterious name-binding semantics in order to encode structure beyond the parse tree. I hate names because they are semantic black holes that hide much of the interesting structure of our programs from human sight, and sometimes even from automated analysis because of undecidable run-time semantics. Roly objects to my ruling out names “a priori”. I am indeed taking an extreme position, but I justify it as a form of intellectual self-discipline. We are so well trained in the habitual syntactic tricks that it is sometimes difficult to see the alternatives. I find I need to banish syntax utterly in order to think clearly about what a program really is. I am just starting to feel confident enough to consider reintroducing textual representations for Subtext programs. But they would only be partial renderings, not a source artifact that can be parsed by a grammar into a program. Syntax must die!

I firmly believe that within 30 years the practice of encoding programs into text strings will be seen as a barbaric relic of the stone age of programming. My ambition is to hasten that day.

This entry was posted in General. Bookmark the permalink. Both comments and trackbacks are currently closed.

8 Comments

  1. John "Z-Bo" Zabroski
    Posted November 15, 2007 at 12:50 am | Permalink

    By formal, I assume you mean “unambiguous”. Missing unambiguous definitions in programming is a common thing in programming, and, based upon my experiences as a Teacher’s Assistant and a student, can often contribute to cognitive dissonance. I remember learning programming for the first time, when I was 11, and being greeted by the term “imperative programming”; I remember being told Visual Basic was a “fourth generation language”; I remember needing to understand the distinction between a Subroutine and a Function. Why a language requires such a distinction and the impact it can have on program architecture strategically and tactically can easily be lost by simply trying to draw circles that enclose the set of all things that are a Subroutine and the set of all things that are a Function. I always believed my programs should convey deeper decision-making than conceptual modeling using these fictitious barriers. I know when I am programming using illusions and when my deep thoughts are not being communicated.

    So engrossed by programming using illusions that by the time I took a Visual Basic course in high school I was spending way too much time fiddling with whitespace than actually writing programs. Of course, since I was so far removed from thinking about my effort towards writing programs, very little effort was spent thinking about eliminating tasks. Not focusing on eliminating tasks is a clear indication of how far I was from programming competently, viz. figuring out what I want to say before figuring out how to say it. Programmers are often said to have laziness as a good quality encoded as “why do something manually when you can write a program to do it for you.” Yet, in the act of programming, I was managing my own labor instead looking to eliminate it. I was doing nothing creative. I was not doing design. I was just coding, coding, coding.

    When every act you perform is tied away from why you are performing it, something is amiss. Everyone has had a feeling where they’ve tried to communicate something edging on the unknowable and the incommunicable and received a blank, dull stare in response. Everyone has had the opportunity to look back upon something they’ve said in the past or written in the past and looked at their own thoughts with a blank, dull stare. Either the communication quality seems poor or the judgment used at the time was subjective, or some other factor, but the result is always that blank, dull stare. The result is always a disjointed paragraph where the sentences don’t seem to flow, the ideas don’t seem properly framed, and the conclusion seems tacked on incredulously.

    When we look at a resulting code solution after a passage of time, its not just about the ancient technology we’re seeing. It’s also smell our sixth sense as programmers can detect; we can detect the ugliness in the code. Even when we don’t remember writing a function, we can sense something is wrong. Time and distance can fade memories, but instincts never go away. These days, when I look at old code, my instinct is, “I’ve got to clean this up”, but my pragmatic half knows I have to weigh the decision to clean it up to something “still ugly, but less so” versus working on something else “just as ugly internally, but outwardly enabling”. These days, I try to think about how I can eliminate labor when writing code instead of managing labor.

    Eliminating labor is the right way to go. It speaks to a programmer’s intrinsic lazy quality. The easy things should be easy, and the hard things should be decomposable into the a few and — this is important — not too many easies. Some programmers have to be responsible for deciding what the hard stuff should be decomposed into. The thing that’s easy to fall prey to is decomposing the hard stuff into an infinite amount of easy steps. The problem with an infinite amount of easy steps is that stuff takes forever! The zen koan of a programmer whose reached the ultimate state of laziness is one whose found a path of least resistance.

    Finding a path of least resistance usually isn’t hard. In data-centric applications, for instance, I can tell you that once you have the database schema all the decisions are laid out for you to follow, from back-end all the way up to the front-end. Why do I need a radio button? Because there is a “dominant” relationship, that manifests as a yes/no declarative constraint. Why is the form field required? Because “null” or “missing” would violate some tacit fact about the user’s pure capabilities. Why do I need a dropdown box? Because I have a list of sortable options where one needs to be selected. Although these questions may not have one-to-one mapping with answers, they indicate that we can put ourselves in the direct pathway of the consequences of our actions. These questions are inspiration for laziness, because they capture accurately our intentions.

    The data entry form is a basic example of how visual cues empower entry clerks to enter information in more accurately. I am reminded of this whenever I go to a restaurant and the drinks section is not on the menu. A restaurant menu is essentially a data entry form. Drinks missing from the menu is unacceptable even supposing there is a valid reason for this (e.g., drinks aren’t hard-coded into the menus because it enables restaurants to switch between Coca-Cola and Pepsi fountains more easily and therefore take advantage of better supply contracts and therefore decrease the total cost of food to the consumer). What happens when I ask the waitress or waiter what drinks they have? Orally, he or she lectures me about half a dozen or more options while I try to keep a ferociously acute antennae to detail. Usually, the first fountain drink I like sticks in my mind and the rest of the options are mute; I aggressively made a choice because I didn’t want to prolong my guests or offend my server. Note that the same information is communicated, but the psychological impact the form had on my decision-making. In a restaurant, the role of the server should be as simple as taking orders and providing meal suggestions when asked (“How are the ribs here?”).

    Let’s ask, Why do we attempt to manage labor instead of eliminate it? What can we learn about ourselves and what knowledge we’ve collected about programming? What do we do on a daily basis that is perhaps completely rote and potentially limiting our ability to eliminate labor? Personally, I see a lot of terms and notions attached to programming that cause a great deal of cognitive dissonance. I saw it in college when tutoring Calculus I & Calculus II students how to program. It was like they had forgotten how to use a calculator, despite the fact Maple, Mathematica and MATLAB are no different from most graphing calculators. The troubling thing is, I rarely see discussion about the problems these ambiguous abstract ideas bring.

    Many computer scientists have glorified ideas like “abstraction” as being a foundation for problem solving. Personally, I don’t “get” abstraction advertised as a Holy Grail. In physics, the more abstract things are, the more obvious it is no one really understands what is being discussed. For instance, no physicist would claim to understand what a barion number is, yet they often talk about it like it’s a tangible concept, even though it is just an abstraction. In truth, notions like barion numbers are an indicator that something is missing, and that something must replace these abstract concepts. Physicists have a term for replacing these abstract concepts, and it’s known as “the general solution”. Abstract concepts in physics only serve to provide malaise to scientists eager to find a unifying theory. There isn’t a computer scientist alive who has met the feats of Isaac Newton in this regard, and I believe a contributing factor is this study of abstraction. We should discard the abstract in the pursuit of the general. What is it we want to do?! As Christopher Strachey once said, “Figure out what you want to say before you figure out how to say it.”

    There are other terms I dislike that I feel shackle programmers unduly. Programmers, like my 11 year old self, need unleashing. With regard to programming, I’ve not progressed far from my 11 year old mind state other than accumulating a bunch of prejudices about “how to program”. I believe programmers are naturally creative people who unintentionally stymie themselves by rote acceptance of ideologies. If someone tells you an object is “data and methods” enough times, then you will believe them. If someone tells you an object is a “message-oriented model where All You Can Do Is Send A Message such that state-process is kept hidden and protected” enough times, then you will believe them. &c. These ideologies habituate thinking about problems in limited ways with the promise that it improves problem solving. However, today, this strikes me as weird: Limiting my freedom to think can’t possibly improve my problem solving, because problem solving involves thinking outside the box to see where the lines are drawn and to discover whether or not the problem I am solving fits within the lines I’ve been contained it. If it doesn’t, I need to hop into a new box.

    Also, Jonathan, your last paragraph can easily be interpreted a number of ways. Perhaps so that your last paragraph isn’t easily taken out of context, you should be more clear. For instance, is a compiler’s specification written in a textual programming language “barbaric relic” or “timeless artifact”? A concrete use case: Converting from one citation format to another citation format. This practical task is a fundamental motivation for learning about compilers. Here, studying compilers will establish some principles for how to avoid gaps and overlaps in source-to-source translation.

    Additionally, one thing you should have used to your advantage in “Why Syntax Must Die” is why it is not sufficient for a compiler to warn the programmer of overlapping logic, like in your OOPSLA demonstration. You aren’t raising this objection, but I am sure there are readers who have raised it in their heads. Always preemptively raise the objection.

    Sorry for the rant.

  2. gerel
    Posted November 16, 2007 at 10:51 pm | Permalink

    You said:
    ##
    I firmly believe that within 30 years the practice of encoding programs into text strings will be seen as a barbaric relic of the stone age of programming. My ambition is to hasten that day.
    ###

    If that would happen, there will not be a place for programmers wwriting down specifications in some other language but more designers troubled with the important aspects of a program.

    I hope we get to that soon, symtax already made us waste lots of time, decades actually.

    cheers

  3. Matt Hellige
    Posted November 20, 2007 at 1:26 am | Permalink

    Since this conversation is already spread across several blogs and the mailing list, I figured I would keep the trend alive, and have posted a response here: http://matt.immute.net/content/subtext-etc

    Thanks as usual for the thought-provoking post!

  4. Sean McDirmid
    Posted November 22, 2007 at 7:16 pm | Permalink

    Why not go in the other direction, completely get rid of formal syntax? The world is becoming increasingly google-centric: type what you kind of want and use google to find exactly what you want. What you are advocating goes in the opposite direction: encode using the UI exactly what you want so precise refactoring is always possible. I believe this is the strictest kind of formal syntax. With a naturalistic syntax, bindings could be resolved in a fuzzy way, type the synonym or misspelling of an entity and the computer could still bind the text to that entity. The computer might not be able to figure it out immediately but the programmer could interactively add more text until the computer figures out the desired solution.

    I guess I’m advocating more parsing (natural language parsing is really hard), not less, and less support for refactoring, not more. We can square off in Nashville next year :)

    A naturalistic syntax, either textual or visual, bindings could be fuzzy: if you type the

  5. Arne Evertsson
    Posted November 25, 2007 at 4:08 am | Permalink

    Keep up the good work. Programming needs to evolve.

  6. Posted November 28, 2007 at 9:11 am | Permalink

    I totally agree that programming needs a paradigm shift. OOP was not it, and has probably set us back decades. Need to focus on productivity, not “beuaty”. ;) I’ve been completely unable to try Ruby because of the over-hype. Personally, I’ve been thinking more along the lines of DSLs, but not sure where to go with it.

  7. Posted December 4, 2007 at 3:39 pm | Permalink

    I don’t think people are being fair about syntax. Syntax is more than strings and symbols. Syntax, I believe, is “the visual denotation of a concept”. This means that not only natural language, but also GUIs for data entry, is a kind of syntax–a kind of visualization of data (where code = data = information).

    By the way, Intentional Programming will let you use ANY kind of syntax–written, visual, UI, whatever, and will store the code in a structured way that, again, makes Rename trivial.

    Even a UI for data entry of code may not be the most productive way to code. Although, I will agree that the code must be stored in a more flexible format than strings–databases or better are a must. Perhaps a wysiwyg editor, or a combination of a navigational code visualization window (for reading your code) and a text input box (for writing code)–a GUI equivalent of an interactive interpreter, or something else maybe?

    The prime boon of the man behind the curtain’s work appears to be his attempt to minimize the write-debug cycle (which is basically an input-feedback cycle, just like two-way communication (between humans) ), through the “example-centric programming”. This is certainly a bottleneck for a programmer!

  8. Jay Araujo
    Posted January 23, 2008 at 1:47 am | Permalink

    “Syntax makes machine manipulation of programs terribly difficult, often impossible ” . Really? what about XML? ;-)