They love me not

My GPCE paper was rejected, and rather harshly. You can read the reviews below. My analysis is that I was rejected for not being incremental to prior work in the field.

——————– review 1 ——————–

PAPER: 41
TITLE: Modular Generation and Customization

OVERALL RATING: 1 (weak accept)
REVIEWER’S CONFIDENCE: 2 (medium)
———————– REVIEW ——————–

This paper addresses an interesting problem: generating a variety of
representations of some data, even when that data evolves over time.
It distills the problem into an interesting challenge problem in order
to raise awareness of this problem and provide a common problem for
use in evaluating different solutions. It also proposes a solution to
this problem in the form of “differential trees” that can be used to
describe data and transformations on that data.

I think that the problem and the proposed solution are quite
interesting and would be right for the GPCE audience. The field trees
and differential trees overcome many of the problems in dealing with
unique names and positions in sequences that are found in program,
data, and model transformations.

There are, however, some problems with the presentation of the paper.
Although the paper claims to solve this problem of dealing with the
evolution of data, and the transformations that will still be
applicable, without change, to the modified data, not enough space is
devoted to explaining this in the solution. In Section 4.2, on page
8, column 1, we have the sentence “As we saw in the previous section,
it is this positional correspondence between Data and Form that allows
Customize to survive the evolution of Data.” But this is no further
explanation of why this is. The reader can figure it on his or her
own but I would expect some explanation of why this all works out.

The preceding description of transformations in differential trees is
also not always so clear. The semantics of this language are
relegated to a PDF file on the Subtext web site. This might be OK if
we had a better description of the example in 4.2. For example, it
seems that “body” is the instantiated body of the function given to
map but this isn’t made clear.

I think that there is room for some discussion of semantics since the
author is using a larger font size that used in most of the other
papers.

The discussion of transformations also suffers in section 3. The
“Field tree summary” claims “Field trees solve the challenge problem
by providing stable identities and positions so that transformations
expressed in those terms can be composed modularly.” There is no way
to evaluate the validity of this claim since we do not, at this point,
see any of these such transformations – only the results of them. On
page 5, column 2, we read “Fig 9 shows who Customize transforms Form
into CustomForm”, but we only see the result of this transformation.
In order to see that the field tree is a good data structure over
which transformations can be applied we really need to see more
clearly how one expresses transformations over them. A little
pseudo-code or something would be helpful here. Granted, the
differential trees provide this solution, but on the first reading of
the paper one is left wondering how transformations on field trees
would behave.

Overall I really like the ideas presented in this paper, but the
presentation is sometimes rather hard to follow.

Minor concerns:

abstract, item 2 in column 2 of page 1: It isn’t clear what you mean
by “unshifting positions” at this point.

page 4, column 2: Is the discussion of trees to represent positions is
not really needed. There are other more important things that should
be said and this discussion could be removed.

Other parts are also note so clear in the presentation.
I found the description of field trees in column 1 of page 5
confusing. A little more description about how field trees are
displayed in the figures would help before getting into the contents
of the trees. Even just adding a statement about fields with no
values but only positions are shown with no colon would be helpful.
Also, the fact that positions are reused – as described in the third
paragraph of 3.3 could be moved up. After we understand the
structure and visualization of the trees, then get into the encoding
of XML in field trees.

——————– review 2 ——————–

PAPER: 41
TITLE: Modular Generation and Customization

OVERALL RATING: 3 (strong accept)
REVIEWER’S CONFIDENCE: 2 (medium)
———————– REVIEW ——————–

This well-written paper presents a challenge problem that captures the
tradeoff between modularity and flexibility that often occurs when
transforming programs in a source language (such as a database schema)
to programs in a target language (such as an HTML form): how to preserve
manual customizations made to the result of automatic generation. In
a well-motivated progression towards a satisfying solution of this
challenge problem, the author presents the concepts of positions (which
stably identify keys in a sequence), field trees (which organize
expressions in a hierarchy of positions), and differential trees (which
generalize spreadsheets to declarative specifications of field trees and
computations on them). These concepts are situated within the author’s
larger project of so-called transformative programming.

I find these concepts compelling as a programming model and convincing
as a solution to the challenge problem. The relation between the paper
and the general tradeoff between modularity and flexibility would be
strengthened if the author could mention (without details, as space
is limited) the same machinery solving more instances of the tradeoff
besides the single concrete challenge problem. I also wonder how this
work relates to Benjamin Pierce et al.’s Harmony/Boomerang project.

Minor comments:

On page 4, “The current implementation … this notional tree”, in
particular “ordered before it, by ascending serial number”, is unclear
because two orderings have been mentioned: one on positions defined
by the sequence abstraction and one on serial numbers defined by this
particular implementation.

Also on page 4, “objects ID’s” -> “object ID’s”

On page 6, the top of the left column discusses storing positions in a
field tree whereas the bottom discusses storing paths of positions in a
field tree, if I understand correctly. This distinction is a bit subtle
so I would appreciate the text confirming or denying it.

On page 7, the discussion of the := mode (the third paragraph) is
unclear. I can’t figure out: what goes wrong if I replace := by :: in
Figure 10(e)? For that matter, why distinguish between : and :: at all?

——————– review 3 ——————–

PAPER: 41
TITLE: Modular Generation and Customization

OVERALL RATING: -3 (strong reject)
REVIEWER’S CONFIDENCE: 3 (high)
———————– REVIEW ——————–

The author presents a solution in the context of subtext for the
problem of modular customization of generators. The example problem is
a system where a database model changes and the application code has
to be updated to take new, removed, or reordered fields into
account. The authors propose an example challenge, and builds towards
a solution by introducing a sequential data structure where positions
don’t change on inserts or deletions, such that positions are a stable
identity for an element of the data structure. Next, the sequences are
included in trees to allows structured objects / records /
fields. Finally, the complete solution in subtext is presented, called
differential trees. Differential trees unify data with the
customization/transformation. The tree describes the customization of
references to other definitions using overriding differences.

+ the underlying proposed model of subtext seems simple and elegant

– however the paper is rather confusing and vague, perhaps due to
being premature

– the argument that the proposed challenge reflects the actual problem
of generation and customization in practice is missing

– a good story about how this works with multi-language programming is
missing

– insufficient discussion of related work in customization (in
particular virtual classes).

The writing is unnecessarily vague. Maybe this is due to prematurity
of the work. The structure of the paper, from positional sequences to
field trees to differential trees is not helpful to me. Actually, the
part I found most comprehensible are the differential trees. Maybe
reversing the paper would help: first explain the complete solution of
differential trees, then dissect its ingredients.

I advice to be much more explicit about your definitions of positional
sequences, field trees, and differential trees. Including some
formalization or pseudo code for some of the operations might be
helpful. More concrete examples would help as well.

The proposed challenge is too limited. I’m not convinced that the
challenge you introduce reflects the complexity of the problem.
Please justify the change from practice (structured HTML generation,
integration of the database in the language/editor). You simplify the
problem enormously by introducing a universal representation and
complete integration. However, complete integration frequently solves
all problems. Yet, in many cases complete integration is not
desirable. I admit that it might actually be the only complete
solution, but this should still be very clearly motivated. Also,
please clearly identify that this is the key to the solution.

Large parts of the paper revolve around positioning of
fields. However, I would argue that the order of fields in the
database and data model should be insignificant altogether (according
to good design principles). Therefore, I don’t get the significance of
the position sequences to the generation and customization
problem. After that, two main themes remain: 1) avoiding textual
identifiers for stable identity and 2) customization by describing
only overriding differences.

For 1), you employ a new development environment with a new,
structured/visual form of editing code. This solves the issue, but
it’s hardly a contribution to explain that in this paper: it’s
completely obvious that this solves the unstable nature of textual
identifiers.

For 2), you employ differential trees. That is definitely an
attractive solution, but similar solutions exist that are not even
referred to. In particular, this kind of customization is highly
related to virtual classes and in particular the simplicity of your
approach reminds me of the essential operators of the DEEP calculus
(“Eliminating Distinctions of Class: Using prototypes to Model Virtual
Classes”). This paper is largely about typing, but the kind of
customization that is possible is very similar to yours. I suggest to
at least compare your work to prototypes and virtual classes.

The complete story about how this will be the core of a multi-language
system is missing. I get the idea, but there is no discussion at all
about the mapping of multiple languages to the proposed differential
tree core of subtext. How are customizations going to be specified?

The reader is currently a bit overwhelmed by the introduction of a
series of unusual names, which obscure the actual underlying
techniques. I would suggest to stick to more conventional
terminology. Too much significance is attached to the introduction of
names, which are also used to list the contributions.

The discussion of position sequences in 3.1 is not clear enough. The
paper describes the solution of using serial numbers only very
briefly.

Inline examples would make 4.1 more clear. In particular, on the first
read I had problems understanding:

* “The assignments of values to fields are seen as a set of
definitions whose implications are worked out, a process called
integration”

* “The arrows on the right indicate references between fields, and
serve as non-textual identifiers for anonymous fields.” ~ This is
confusing because the fields actually appears to have names in the
example.

* “”live” execution” ~ please provide a bit more context of how the
trees will be used in subtext to understand this. Currently, you
assume that the reader is familiar with what subtext is.

* “Every definition expresses an overriding difference between its
containers and their definitions” ~ It’s completely clear later, but
an inline example would clear this up completely.

In general, it seems to be that the part on differential trees does
not describe the solution to the challenge in full detail. You don’t
go back to the challenge to show step by step how the introduction of
a company and the removal of a phone number works. Figure 11: for
differential trees, it’s unclear what determines the ordering of the
elements when the contents of the CustomForm gets evaluated. How is a
position assigned to a new definition, for example what controls the
position of the new id2 definition? What mechanism makes phone come
last in the result, as required?

Some more minor details:

* page 5, at the top of the right column please refer to Figure 3 when
you are discussing Figure 9.

* page 5: “Note that these new positions are allocated once when the
transformations are written, not every time they are executed” ~
This is unclear. I don’t get this note at this point.

* page 5: “the name of the position can be changed”. I assume a name
is a purely symbolic name for presentation purpose, but could you be
a bit more specific about the role of the name.

* page 6: “But we also don’t want to hard-code hex strings for
internal position IDs” ~ I can see what you want to say here, but
readers will wonder where hex strings suddenly come in. Please
rephrase.

* page 6, second paragraph (“The problem could be solved”). After
reading the rest of the paper I understand this part, but at this
point it is rather unclear. It would help a lot of you would give
the reader more context information earlier in the paper.

* page 8: “These approaches handle customization through roundtrip
engineering”. That statement is a bit too general.

——————– review 4 ——————–

PAPER: 41
TITLE: Modular Generation and Customization

OVERALL RATING: -3 (strong reject)
REVIEWER’S CONFIDENCE: 4 (expert)
———————– REVIEW ——————–

Summary of the Paper:

A method, “field trees” for labelling places in code is described.
A generalization called “diffential trees” is built on
the field tree foundation. These schemes are used to
“solve” the problem of propagating informatoin to
multiple targets in generated software artifacts.

Comments for the Authors

I found your paper impossible to read, because you did not use terms
that I understood, and your key terms were never defined crisply.
Consequently, I did not understand except in the vaguest sense what
you were trying to tell me.

You define “positional sequence to be a sorted map from a domain of
identifiers”. Usually when one defines a map (you mean this in
mathematical sense, right?), one defines a domain and a range. What’s
the range of a “positional sequence”? What does “sorted map” mean?
You don’t provide decent definition, and you don’t provide an example.
OK, I’m already lost, I have no intuition and no examples.

“Field tress are nested positional sequences”. OK, I already don’t
know what positional sequences are. So now I don’t know what field
trees are. If I believe that positional sequences are a mathematical
“map”, I don’t know what “nested maps” are. You show an example
(Figure 8) and claim some kind of connection between id for Data
containing 1234 and value 1234 but I don’t see how this relationship
is justified. Offering me an implementation in an appendix which
isn’t provided in the paper is pointless.

“Positions are encoded as a path … from the root of this tree”.
Like, child1.child3.child9.child2? And what if a subtree gets
deleted? Gets moved?

“Differential tres declaratively specify field tree transformations”.
OK, how? THe answer, “…by example” isn’t adequate for a technical
paper in which this is the supposed to be the primary result.

Regarding the problem definition: I would have characterized
it as:
Data —Gen1–> Form1
| |
| –Gen2–> Form2
evolve
|
|
V
Data’ —Gen1–> Form1
|
–Gen2–> Form2

which gives you a completely different view of the world.
Now what I need is given that I’ve computed Gen1(Data) (==>form1),
how do I compute Gen1(Data+delta) where Data+Delta==Data’?
Ideally, it is some function
Update(Data,delta)=Gen1(Data)+Mods(Data,delta)
so I can take the original result Gen1(Data) and patch (“+”)
it. This gives you a completely different view of the problem.
Check out finite differencing.

Now, most of your argument seems to be how to label a place in source
code. You don’t like comments containing generated symbols “because
they are invisible to many tools”. So your solution is propose some
*one* tool that stores a database of “persistent internal IDs”. I
don’t see how this is different than the generated symbols, and I
don’t see how this makes getting tools any easier, now I have to have
*your* specific tool.

You claim to somehow use differential trees to “generate” the code?
How? Where does a differential tree encode all the crud that really
makes up an HTML page?

Finally, you whole scheme seems focused around some simply reshuffling
of a few fields. Changes in real programs are often more interesting
than that. What if the problem is fields move from one page to
another? If they are added together to get the new result?
Ultimately the problem is one of traceability: one has to be able to
say how one element of code was derived from underlying structures
(e.g., this HTML output field was derived from the database). So a
trace that somehow links the database slot to the HTML code that
depends on it, along with an indication of the dependency type seems
fundamental. I don’t how a position numbering scheme can provide that
traceability.

That’s the major stuff.

Minor stuff:

What on earth does PHP have to do with this paper? All of
your points could have been made with just the HTML.

But if the HTML form had been embedded in a PHP script,
I don’t see how you would have generated whatever PHP code
from whatever the examples you had in the paper.

Points in favor or against (sent to authors)

– Incoherent definitions of key concepts and results

11 Replies to “They love me not”

  1. My OOPSLA paper rejection was also unnecessarily harsh. I took a deep breath and wrote a nice author response, yet none of the reviewers changed the content of their reviews, despite my addressing some of their concerns.

    One reviewer I got was a lot like your #4. He even criticized my use of a hyphen and called the paper “poorly written.” One of them suggested that English wasn’t my native language! Talk about adding insult to injury.

  2. You’re being too hard on yourself. They love you, but only as a friend. It sounded like the real problem was vagueness in the definitions you used. Even the two reviewers that liked your paper thought a few things were vague and unclear. Similarly, the detractors seemed to find your proposal interesting, even if they found the details lacking.

  3. reviewer #3 says:

    *”insufficient discussion of related work in customization (in particular virtual classes)”
    (he really just wants you to cite his own paper on virtual classes)

    *”paper is rather confusing and vague”
    (people with some common sense will understand, but phds will find it difficult)

    reviewer #4 (“expert”) says:
    “I did not understand except in the vaguest sense what
    you were trying to tell me.”
    (well, we already knew (reviewer #3 told us) that this guy won’t understand)

    other than #4, the reviewers seem to like it and just give constructive criticism. they do love you!

  4. #4 was harsh but i think #1 was pretty much spot-on. he likes the problem, he likes the solution, but some of the paper was a little hard to follow, and more space could have been spend on the solution/example at the end. i have followed subtest for a few years and even then i was left scratching my head at the end about how exactly differential trees would fit into a workflow. don’t take #4 too hard, sometimes reviewers go overboard trying to get some point across.

  5. GPCE is pretty narrowly focused on macros and the such, so its not too surprising they didn’t accept your paper. Even if your paper did get in, you might not find a receptive audience. Its a bit bizarre, but this kind of paper would have a better chance at ECOOP or OOPSLA with the right spin.

    I’ve done the PC thing once (will try it once again soon), and you know…its very hit or miss. Everyone on the PC is very smart and informed, they really do read these papers carefully, but lots of good papers get rejected while lots of soso papers get accepted based on content appeal to the reviewer.

    If I were you, I’d try to do more Onward papers. They are surely more difficult to get in, but they are more interesting/fun to write and read, and the impact is much broader! I actually can’t find many papers in the mainline of OOPSLA these days to get very excited about, and its even more difficult in the more narrow conferences (GPCE).

  6. Reviewer #3: “You simplify the
    problem enormously by introducing a universal representation and
    complete integration. However, complete integration frequently solves
    all problems. Yet, in many cases complete integration is not
    desirable. I admit that it might actually be the only complete
    solution, but this should still be very clearly motivated.”

    I wish reviewer #3would have given more elaboration of cases where complete integration is not desirable. Sounds like he/she doesn’t know what he/she is talking about, especially when he/she says “it might be the only complete solution”. Might? huh? Either there ARE cases where complete integration is not desirable, or there aren’t. I really hate vague statements that say something is “not desirable”. That’s similar to people saying certain types of inheritance are “dangerous”, “more dangerous” or “especially dangerous” than others. Or professors saying that “abstraction is fundamental to computer science” (as though it were a first principle rather than a first smell of err). When confronted, the best I’ve seen Ph.D.’s do is fall back on wristy gestures and handy poses: they give you their own contrived explanation of why something is “fundamental”. My experience is that once those abstractions have to be maintained, the desire is to frequently obliterate them and begin anew.

    I saw your paper as a way to limit the number of abstractions those of us down in the trenches have to keep track of. This is an emotional issue, I guess, because it means I don’t have to demand as much explanation from my coworkers about how a change affects the system as a whole. In the real world, when you are maintaining some legacy system and preparing to bring up a new system that replaces it, it is your job as the maintenance programmer to be one demanding prick when it comes to understanding the system. Whenever I think about this problem, one picture springs to mind: those side-by-side Mythical Man-Month vs. Real-World Man-Month graph comparisons that Brooks made.

    Also, Reviewer #3:
    “The example problem is
    a system where a database model changes and the application code has
    to be updated to take new, removed, or reordered fields into
    account.”

    The real problem pattern is where A SCHEMA CHANGES, AND FUNGIBLE, DERIVED WORKS MUST BE UPDATED TO BE KEPT IN SYNC. We’re talking about complex systems that should be able to explain themselves to their users here. You shouldn’t need five people on a maintenance team to push through a damn application update. We’re dealing with “One Fact, One Place, One Time”; so why not let the schema be the Single Source of Truth and use it to explain what fungible, derived works need to be updated?

    The other reviews didn’t really have much in the way of interesting feedback (unless you consider demanding more rigor to be ‘interesting’), although Reviewer #4 seems to indicate that there is a potential “problem pattern” for transformative programming that needs to be written on this subject. The fact he/she re-casts the problem differently is what makes me think that. Also, the discussion in previous posts in this blog indicate that others have tried handling similar problems with great difficulty. Perhaps a patterns paper should be published to raise awareness to a broader community, with “no best practice / no known solution / this is an open problem” as the present day answer.

  7. The important thing about feedback is to listen to it, and not get defensive. The common theme struck in most of the reviews is that the paper is hard to read (and they’re right: it is.)

    A trick I’ve used in the past (and I’ve published plenty, so I know this works): hand your paper to a colleague you trust who happens to be a better writer than you. Ask them to give your paper a proof-reading. I know it is a lot of work, but that’s what mentors are for.

    And sure, you can’t please everyone all the time (and you shouldn’t strive for that), but if sympathetic experts in your field can’t even understand what you’re saying (as some of the people in this comment stream are hinting at), then it doesn’t matter at *all* how good the ideas are.

    Not the end of the world. You’ve got great ideas. Hammering the language into shape is *also* hard work, but that’s the nature of pitching new ideas.

    Don’t give up. And good luck!

  8. Following up on what Matt said:

    Way better than handing a paper to a colleague you trust is to organize meetings with your research group where you present either (a) your summary of someone else’s work unrelated to your current focus [stuff you ‘starred’ in your newsreader] (b) your summary of someone else’s work related to your current focus (c) your summary of your current work-in-progress.

    Meet once a week (Wednesday at 4 p.m. is a good way to end the Hump Day). One person presents on (a), another on (b), another on (c). For (c), others ask questions and provide suggestions. Whoever isn’t presenting gets to buy snacks for the meeting.

    Meeting once a week is way more proactive and makes it easier to put together your ideas when paper-writing time comes.

    This is the approach I learned while working at BNL, and it is an approach that greatly improved my research skills. While I was there, it also led to some great inspiration. I found that after awhile I could look at the headlines in my newsreader from Nature, Science, PNAS, ACM’s reviews.com, etc., and automatically know whether a paper was a “must read” or not, simply for the edifying value. After enough practice, you recognize author names, what projects they are working on, etc. It’s way better than Googling for academic value.

    Jonathan, I know the group you are in at MIT is small compared to the overall CSAIL, but do you meet once a week like I described above? If not, then give it a shot and blog about it “blogademics”-style 😉 If people resist, then consider starting off with an internal tumblelog where people can post stuff at will. Some people prefer this because it means less human interaction – that’s important when you’re busy meeting deadlines, etc. 🙂

    You should still grab a great editor and let him/her go to town. In the past, I’ve actually used the services of two retired high school English teachers; retired English teachers seem to be the best free lance editors. I can give you the contact information for one of them if you’re interested. He can perform magic with words, and probably would’ve caught some of the clarity issues targeted by your reviewers.

    Having said this, I don’t think Jonathan was being defensive so much as having some angst (“they love me not”) over the fact he can’t yet continue to innovate until people understand his present ideas.

  9. Maybe I should have posted this before you submitted the paper but the thing I found most lacking is a good definition of the problem. I wasn’t entirely convinced that the problem is the right problem to focus on.

    If understand you correctly the problem is this:

    Artifact B depends on Artifact A
    How do we change artifact A while automatically keeping artifact B semantically valid, assuming that the semantic change in A should prompt a similar semantic change in B?

    My solution is one of
    a) Accept that the semantics of A and B can be out of sync and try to weaken the dependency between A and B so B won’t break when A changes.
    or
    b) Strengthen the dependency so that B breaks visibly for the person trying to change A, or otherwise stops A from being changed without B being updated.

    why is your solution preferred over these two?

    [Well, no, that is not the problem. There are three different transformations that need to be reconciled: generation, customization, and evolution. See Figure 7. But one of the reviewers, and supposedly the most expert, did not understand this either, so I must not have made it clear enough – Jonathan]

  10. I’m not expressing an opinion one way or the other on the reviews of your GPCE paper, but I thought you might find this Radio 4 programme from yesterday interesting:

    http://www.bbc.co.uk/iplayer/console/b00ctk01

    “Peer Review in the Dock

    Monday 4 August 2008 21:00-21:30 (Radio 4 FM)

    Mark Whitaker investigates the tarnished image of a flawed process. Peer Review is supposed to be the keystone of quality control for research projects and academic studies, yet evidence of its many deficiencies has been building up for over 20 years.

    American lawyers have started challenging expert witnesses on the basis that peer review no longer guarantees their expertise. Yet accurate peer review in fields such as medicine can be a matter of life and death.”

Comments are closed.