Devolving Subtext to JavaScript

I have to rewrite Subtext in JavaScript. First, a quick update. I presented my prototype implementation at the working group this summer, to a tepid response. But in the course of preparing for that I thought of a radical simplification to the language. That inspired a whole rampage of brutal simplifications. The code is so much happier now. And when the code is happy, I’m happy.

I have also been spending a lot of time thinking about my “market”. I have an idea for targeting web/mobile apps with a new twist. More anon. In other news, I am going back to calling the language Subtext. The name is apropos to my new direction, and it seems to be my brand. People ask me if I’m the “subtext guy”.

Now I need to run inside the browser. But rewriting my evolving Java implementation in JavaScript fills me with dread. People that do serious large-scale programming in JavaScript are often deeply troubled by it, and resort to tools like Google’s Closure and GWT that treat the language as a bug to be worked around. The fact that Google felt the need to make big investments in these tools is an indictment of the language that can not be easily dismissed. I have come up with the following three-step strategy:

  1. Keep the parser and interpreter in Java and do the UI as a thin shell in JavaScript doing fine-grained RPC to a web server running the interpreter. This is the fastest and safest way to get up and running. But it means there needs to be a locally running web server attached to the browser. An iPad would need a local PC as its server. This is really just a mock-up for demo purposes, but it is the easiest, and it sits squarely in the sweet spot of current tools.
  2. Use GWT to run the interpreter inside the browser. The parser can’t be GWT’d, so it needs to be rewritten in JavaScript, perhaps using PEG.js. The RPC work done in stage 1 is not wasted, as the JavaScript/Java interface in GWT needs to be handled similarly. Deploying and testing and debugging get more awkward in this approach, so it is best done once the code is already working in stage 1.
  3. Rewrite the interpreter in JavaScript. Hopefully by deferring this step there will be time for JavaScript tools and libraries to better mature, or for other options to emerge, like fully-tooled CoffeeScript, or JS.next, or Dart (my preference). Deferring this step would also be an opportunity to do performance work (likely needed for tablets) which would be premature right now.

43 Replies to “Devolving Subtext to JavaScript”

  1. I would really love to help, I can do both Java and JavaScript, and CSS3, etc. for elegant and simple web applications.

    I believe in literate programming and for the past year I’ve been re-reading Gelernter’s “Programming Linguistics”, de Souza’s “Semiotic Engineering for HCI” and Kumiko Tanaka-Ishii “Semiotics of Programming” to try to formulate an idea on how to expose and interact with a running program.

    1. Good reading list. Consider looking into Goguen’s treatment (algebraic semiotics) as well.

      My major problem with taking an algebraic structure and trying to make an interface for it is that it simply isn’t easy. You would think, “Oh, I have this algebraic structure, all I need is an algorithm to visualize it”. But that second step is quite hard. I generally put hacks in and limit what can actually be done in order to keep the user interface simple.

  2. I encourage you to look at CoffeeScript — it’s a beautiful JavaScript shorthand. I resisted using it for a long time because I thought debugging would be a pain. But CoffeeKup is such a compelling example of a beauty inexpressible in JavaScript that I decided to take the plunge.

    Debugging is still annoying sometimes (being worked on, see e.g. [1]) but I wouldn’t go back to JS.

    Give Coffee a try 🙂

    1. CoffeKup is cool, thanks for pointing it out. I am kind of taking that idea to the limit, where the language is in a sense nothing more than a template engine. It’s templates all the way down. Higher-order, bidirectional templates.

      One of the reasons to defer doing a JS port is the hope that CoffeeScript will get some decent tooling in the meantime.

  3. Don’t fear a rewrite in JavaScript for lack of tools — I’d imagine that if you liked, you’d be able to do a straight port from the current Java implementation pretty easily. Unless you’re using some fancy parser generator library that has no JavaScript equivalent?

    1. Ironically, JavaScript might be better than Java for parsing, at least the kind of lightweight in-code parsing I favor. PEG.js looks sweet, and exploits the dynamic nature of the language to good effect. I hope it succeeds.

      But why rewrite in an ugly wart-covered language like JavaScript when CoffeeScript will probably have adequate tooling if I can hold out for another 6 months or so?

      1. I’d love to know what you’d consider to be “adequate tooling” in this case. Got a minute to spell it out?

          1. Actually coffeescript support for rubymine (and other JetBrains products) are getting pretty good. And refactoring should soon be possible.

            About type annotations there’s variant of coffeescript supporting contracts: http://disnetdev.com/blog/2011/08/23/Contracts.coffee-Contracts-For-JavaScript-and-CoffeeScript/

            I also experimented with type annotations in coffeescript – it’s not that hard to add them, coffeescript provides very nice intermidiate AST that can be parsed and type logic can be easilly applied. Hardest work (and reason I did not finish my implementation) is that You need a type database for underlaying platform (IE/FF/Opera/Chrome/Node.js/Thino etc.) Without it You can do only most basic stuff.

            If You’re going to reimplement subtext in JS/CoffeeScript please let me know I’d be glad to help 🙂

  4. D’oh! New plan:

    1. Same as above: Ultra-thin JavaScript calling Java.
    2. Post cool screencast, upload to GitHub, ask for volunteers to port to JavaScript.
    3. Check in contributions

    Could that really work? Maybe worth an experiment.

    1. Hey Jonathan,

      Some amazing things can be done in GWT.
      http://code.google.com/p/quake2-gwt-port/

      I’m doing parsing in browser in GWT using ANTLR. You can find the gwtified ANTLR library at http://code.google.com/p/gwtified/.

      Are you still using XText?
      I would be happy to look at helping to gwtifing Subtext as it exist now.

      How about this for a plan;

      1. Upload a Java version of Subtext to GitHub / Bitbucket.
      2. Post a cool screencast, ask for volunteers and let the forks fly.
      3. Merge contributions.

      1. Gary, GWT does look like a good option. I am using parboiled for parsing, which will definitely not port to GWT. I am going to focus first on bringing up a plain Java version, for which there is still quite a bit of work left to do.

    1. That’s a long story, and this isn’t the place for it. Net effect though is that the type & effect system I defined becomes irrelevant to the semantics of the language, so a naive interpreter can completely ignore it. Only becomes relevant for performance optimization and code analysis in the IDE.

  5. Do you have anything to share with us from your presentation to the working group last summer?

    I’m not sure what Subtext looks like anymore. It appears from the slides you presented in May of this year to bear a resemblance to the Subtext from the early days. Back then Subtext had a grammar but it was also tree-based and dynamic.

    Thanks,

    Peter

    1. I decided not to post the slides because they don’t stand on their own (which is how it should be). The language is currently textual, because that is easier both to implement and to explain. But I have been designing the language all along for structured editing, so the syntax is unfriendly for text editing, and a text editor won’t be usable for serious programming. I want to build a touch-centric structure editor for tablets, but that won’t be in the first release.

      The key issue is name binding, which is conventionally based on lexical scope. I am instead requiring the programmer or IDE to do all name binding at edit time. References are like relative file paths, e.g. ^^^foo.bar (where ^ is “up”). I expect this is going to be one of the most controversial features of the language. I think lexical binding leads to many problems, while structural editing will be able to do smart implicit refactoring. But maybe I am being too stubborn on this and will need to compromise to get people to use the language.

      1. When implementing structured editors it’s wasn’t just the coding part that’s hard, but even more so the interaction design. I couldn’t figure out a good way to navigate and edit with the keyboard in a structured editor. Touch tablets certainly make this a lot easier, but for the time being we are still with keyboards.

        On the one hand you can take a purist approach and navigate the syntax tree logically with commands for going up, down, left and right in the tree structure. But this is not very intuitive, though perhaps it just takes time to get accustomed to. There are questions like what do you do if the user presses the “left” command on a leftmost child?

        On the other hand you can take a textual approach and navigate the tree with a cursor like in a text editor. This is not very appealing either, it just doesn’t seem like a good way to edit a tree. It’s easy in the sense that we’re used to it from text editors, but that’s just an accident of history.

        Do you have any ideas on this? What are good navigation & editing commands?

        What problems do you think lexical binding leads to? I’ve found it quite natural in the context of structured editors. A node that introduces new names (like a variable declaration, or a function declaration with parameters) simply passes an environment structure to its children.

        1. One problem with lexical binding is that when you copy and paste a piece of text, the meaning of all its symbols potentially change based on the context. Another problem is that defining or undefining a symbol can implicitly change the bindings of symbols in nested scopes. Most programmers are so used to these problems that it seems natural to them. I envision the Subtext environment as making bindings explicitly and persistently, with textual names being no more than search keys to assist. Name clashes and shadowing are irrelevant. Bindings, once made, do not change because the code moves, or other bindings change.

          1. Ah, I see what you mean. I agree that variable names should just be programmer aids, but that doesn’t mean that scoping isn’t lexical. For example in the De Bruijn representation of lambda calculus (http://en.wikipedia.org/wiki/De_Bruijn_index) there are no variable names, yet it is lambda calculus.

            Even if you have variables point directly to definitions instead of via textual names, that doesn’t solve the copy-paste problem yet. If you copy a subexpression inside a function that refers to the function’s parameters to another function, how are you going to do that in a meaningful way? If the variables keep pointing to the other function, then the code doesn’t have any meaning. If your representation is such that the variables now point to other things, which things do they point to? As far as I can see the most sensible thing to do is to flag ill-scoped variables in the IDE so that the programmer can fix them manually, instead of using some heuristic to try to do it automatically. Or do you have a way of giving the variables new meanings in their new context that still makes sense?

        2. Jules,

          Copying code out of a function body would be like taking a closure: the default values of the arguments would be captured. This is possible because bindings are arbitrary tree paths, not just upward references as with lexical scope. By providing a notion of binding that can be conserved across structural edits I can enable the sort of fancy IDE features you suggest. That can not be done gracefully in an IDE based on character editing, nor would it be tolerated by many who have spent years learning how to exploit the properties of lexical scope while editing code. The proof will be in the pudding, and I must admit I am a long ways from having that kind of IDE, as I have decided that I need to sell Subtext first based on its expressive power. So initially it is going to be a textual language with some strange features that may be awkward to use from a text editor.

          1. Suppose you have the following code:

            function foo(a,b) {
            return a+b*10;
            }
            function bar(c) {
            return c + 5;
            }

            And say you copy the subexpression b*10 and paste it inside bar. What is the analog to this in the language you’re envisioning? What is b pointing to when pasted in bar? If your language is so different that there isn’t an analog to this, can you give your own example?

        3. Copying “b*10” would produce a reference to the foo function prototype’s b argument. Often function arguments are given default values in the prototype, and that value would be bound to. In this case there is no default value, so execution would produce an undefined value error. It might look something like this (mixing your syntax with Subtext path notation):

          function foo(a,b) {
          return a+b*10;
          }
          function bar(c) {
          return ^foo.b * 10;
          }

          1. That’s an interesting approach. Do you think that it is often the case that you want the variables to point to the default values when you copy/paste code, or that it is more likely that the programmer will want to change the variable references to something else after pasting?

          2. Those bindings are not desirable. But neither are many bindings that lexical scope gives you after an edit. What is important is that there is a very simple rule that external bindings don’t change even when the code moves. And internal bindings are mapped isomorphically. That same rule provides closure semantics for free, instead of it being a subtle aspect of run-time semantics. Binding mapping is a foundational aspect of Subtext semantics, analogous to Lambda calculus. I believe it is also a better foundation for IDE support than lexical binding.

  6. I had more or less the same idea, (i.e. example based programming/instant results programming) and started to hunt around to see what else others had done. As a result I got here ….
    My basic observation was, more people are able to use spreadsheets than Java/C++ ect. Also, maybe we could double the number of programmers in the world by enhancing spreadsheet “programming”. But spreadsheets are limited, they don’t have loops/function/classes/IO etc. I was wondering if we could make spreadsheets more powerful. Also, I was also wondering if it was possible to auto generate code from a spreadsheet type interface.

    Anyway …. I have a devil’s advocate type question. What is the objective of subtext, (put another way) What order of magnitude improvements are you aiming at. What is the value of your objective.
    Some common objectives include:
    1) reduce software development time. The normal way this is done is by reducing the number of lines of code a developer needs to write. Because of this rule, to some extent, assembler has more or less vanished.
    2) Quality control. i.e. reduce the number of bugs in the end system. The normal ways this is done is using multiphase development (requirements/design/code/testing) and checking for mistakes in the documents created in each phase. Subtext is a good fit for the code phase.
    3) Teaching. Make it simpler for people to understand how a program works.
    4) Increase the number of people who can create programs.

    PS. I would like to know how subtext deals with Input/Output Streams. I have not thought of a clean way to handle them. e.g. user input/file IO/client server etc.

    1. Eddy, I/O is indeed the fundamental problem that breaks functional programming, which is the basis of most simple programming models like spreadsheets and earlier versions of Subtext. That is what I have been working on for the last few years. I think I have a solution now, but I repeatedly fail when trying to explain it to PL experts. So I am going to instead demonstrate it by example.

      1. I never really understood what’ the problem with IO and functional programming.
        Why not threat IO as a source of events/function calls?

        1. That is fine if the system can be seen as a single threaded pipeline processing events. Real systems often aren’t. Events split up and hit different things defined in different places, triggering other events, which then join back up again with unpredictable timings. The plumbing and synchronization quickly become intractable. These are the familiar old problems of race conditions, lifted up a level of abstraction or two so they are even more infernal to understand.

          1. Wouldn’t solution be to simply use ummutable data structures + transactions, and restrict one event = on thread ?
            Seems to work pretty well in database servers like postgresql (transactions) and rails (one call per thread – thus avoiding most multithread and synchronization problems).

    2. Hi Eddy,

      We can to a similar conclusion, but from a different direction. I co-founder a startup Sumwise.com, with the intention of making the spreadsheet paradigm more powerful by capturing the design patterns of expert modellers.

      It became obvious after a short while that what we were doing was programming language development. We have implemented/release a number of features including non-recursive function and have numerous others in the pipeline including loops (via goal seek style optimisations, and recursive tree referencing), types (schema enforcement) and recursive functions.

      I still don’t have a great handle on the I/O problem. Would it be possible to treat input as an Actor have it send mutation messages (structured edits, eg. change cell value, insert row etc) to a Model (think sheet in a spreadsheet) which them changes it’s state and propagate changes to it’s dependants? For this to work concurrency control of the structured edits needs to be in place. Any thoughts?

      I recently talked about some of the programming language aspects of Sumwise at SAPLING (Sydney Area Programming Languages INterest Group)
      Spreadsheets all the way down. I’ll get the slide online soon.

      1. Hi Gary,
        The I/O problem:
        Yes, I think you are right, the sheet within a sheet idea works. Allow cells to be added/removed/updated. This way, all I/O becomes state modification.
        For IO, you can then use database style transactions (Insert/delete/update). You can then control concurrency with transactions, in the same way a database does. (Google docs looks like it controls concurrency this way, more of less)
        A User Interface could look after events (Mouse click etc) and pass only state updates to the spreadsheet. I think this would be cleaner than passing all events as a stream to the spreadsheet.

  7. Jonathan,

    If you’re looking for a JS parser, may I suggest OMeta.js? Its not the “in code” parser you’re looking for but its one tick away from being one, and metacircular to boot! Not to mention that it simplifies the entire lex/parse/AST gen cycle to a single conceptual process.

    And while I’m at it, can I try to entice you away from the Java-like JS UI libraries, towards a much more open architecture? See http://concrete-editor.org/ and its backstory presentation (http://ruby-gen.org/page_attachments/0000/0007/codegeneration10_concrete.pdf) for a simpler way of a) building an intuitive yet structural web UI b) representing a structural language in it. Then head over to the screencast for a demo of how easily one projectional view can be changed to another (and a tabular one at that).

    I’m not suggesting you use the exact code, but the technique IMO has merit especially in Subtext’s case as the Subtext 2.0 vision of Schematic tables seems much more tractable with such an approach than being wrapped up in a standard gui library’s assumptions of component propriety.

    Finally, as others have offered to help; so do I.

  8. Have you thought about Dart ? It compiles to Javascript and should make a translation to Java a bit easier. It’s very young of course….

    1. Thanks for the link Philip, I hadn’t seen that. The demo shows it can do some fancy stuff, but with some high-powered abstractions and a lot of plumbing. Typical hazard of functional programming, in my opinion. Note the two levels of templating required, in the render function and the XML.

      I too am not sold on pure functional programming, and I hope to sell Subtext as an alternative.

  9. Hi Jonathan—this is a very encouraging development. Found the subtext videos quite fascinating and I’m a big fan of your work. I’m curious—the subtext 2 video demonstrates the use of schematic tables as a notation for functional semantics. Did you also develope a tabular notation for expressing imperative effects i.e. assignment and iteration?

  10. Yes Lach, it took me four years, but I finally figured out how to represent imperative effects in a simple linear fashion. Subtext 3 is all about demonstrating and exploring this new semantics, though in a textual language. The next step will be to go meta and use Subtext to represent itself and build its own graphical UI, which I hope will allow me to come full circle back to Schematic Tables and Example-centric Programming.

    1. Sweet. I’ll be watching with interest. Keen to learn more about the new semantics of Subtext 3. Is there a paper or some other resource where you’re spilling the beans?

Comments are closed.