Translating an Idea into a Paper

Posted on June 30, 2018 by Gabe Parmer

An important and necessary part of learning to do research is learning how to effectively convey your ideas to others. One of the main mechanisms we use for this is academic papers, and each PhD student spends a lot of time learning the conventions of writing a paper in their area, and learning the effective means of convey to others and convincing them of their ideas. Learning to effectively write papers is one of the important skills that you will acquire on the path to being an expert in your field, and receiving a Doctorate. You should go into the process taking this skill seriously, and with a commitment to master it. Especially if you believe you’re interested in working in research (academia, or lab), this is a required skill. This post discusses one version of the process for going from idea, to published paper. There are many different successful models for this, so look at this as a default option from which you can deviate with intention.

Idea \(\to\) Outline

Presumably you’ve been working with your advisor on the technical aspects of your work. However, when discussing and working on the “how” and “what” of your research, you should also back away and make sure not to lose the “why”. This section discusses the process to mature your research idea into something that can be the basis for your paper. You’ll find that the first time you do this, it will take a lot of time, and you’ll need to work with your advisor a lot to flesh it out. As you progress, and have done a few papers, this process is still hard, but becomes more methodical and efficient.

The goal of the outline is to give you some guidance for

what the fundamental motivation for the work is,
how it can be justified to the world,
what experiments make the strongest argument for the work, and
to enable you and your collaborators to be on the same page for the rest of the research and paper.

Make an outline.md¹ document in your paper’s directory in the publications/ repository (remember: making the paper’s directory makes you choose a conference and deadline). In it, focus separately on four main sections:

the motivation with a focus on key aspects of current systems that are insufficient,
the story, or the logic surrounding a justification for your research,
the contributions, stated tersely, and
the evaluation, which includes a sub-section per experiment, and a focus on the key questions each evaluation is answering.

The motivation and the story are related, but I break them into separate concerns because I’ve noticed that researchers have a tendency to not pay sufficient attention to the shortcomings of current systems. The motivation zeros in on this aspect, while the story provides the surrounding context that captures the logical argument for the work.

While you’re writing an outline, please ensure that your text is grammatical, and makes sense. This is the time to focus on terseness, and cutting to the core issue. If you’re writing long-form paragraphs, you must do a few editing passes to cut to the core of the argument.

The outline should include a section for each of these four sections.

Motivation

Focus on answering two questions:

What is wrong with current systems, why does this motivate your design?
What experiments would clearly show the deficiency of current systems?

This might require a description of existing systems, to provide context. You should try and make sure your motivation is to the point and as concise as possible. If it isn’t clear from the description, this certainly means that you have to understand other systems to be able to properly characterize them.

Try and succinctly summarize a small number of experiments or qualitative factors (max 2-3) that summarize what is wrong with current systems, and why that motivates your design.

Story

What is the arc of the story for the paper? How do you go from

factors that people care about (speed, parallelism, costs, reliability) that are properly justified, to
an argument that existing systems aren’t optimal for these factors (this might be redundant with motivation), to
a value proposition for your work that it can improve systems for those factors, and finally to
a justification for your rough design that is simple and can be summarized in a small number of bullet points (e.g. four).

This will get translated into the introduction and motivation sections, and will impact how you organize the description of the design, and the evaluation. It is important to make sure that your story is clear and as concise as possible. If you end up writing a ton of text, then make sure to go through it again, and simplify. I find it is often best to simply have your story be a bullet-list of items that logically take you through the story as this forces you to be a little more terse.

If your motivation and story don’t fit onto a page when spelled out tersely, then there is likely little hope they can fit into a paper, so focus on being concise. This focus will pay off when you’re writing the paper, and already know what the most important aspects on which you need to focus.

Contributions

Now that you’ve tersely stated the motivation and story, you should be prepared to equally tersely state the contributions of the work. The list of contributions is very important for a number of reasons:

it is your stated list of advances to human knowledge and capability,
it is what your audience should walk away from your paper with understanding and agreement,
it sets the stage for your evaluation which is mean to validate those contributions.

When reviewing a paper, the main sequence of questions I often ask are

What are the contributions of the paper?
Are these contributions compelling and of sufficient interest for the bar of the conference?
Are these contributions accurate compared to related and previous research in the publication record?
Does the design accurately encompass these contributions?
Does the evaluation properly and completely validate these contributions (within reason)?

They are, in many ways, the skeleton that binds the paper together. Unfortunately, I do find it is hard to come up with them without properly diving into the motivation and story first.

No contribution should be more than three sentences, and you should likely not have more than five in total. Some “scoping” of contributions is required, and can be discussed with your collaborators.

Evaluation

This is the section that is often the most difficult and will consume the majority of your time and of the outline’s space. The point of this in the outline is to flesh out a lot of the details:

What are the results, and how are they depicted?
What systems are being compared?
What are the relevant aspects of their setup and how are they configured?
What are the structures of any tables, and the x- and y-axis, along with the different lines/bars for each graph.

If you have a “dimension problem” where too many variables exist in the system, this section will force you to figure out a way to represent an abstraction of those. Getting to this level of detail is difficult, and requires aiming to finish this outline two months before the deadline. Most importantly, the outline will give you strong guidance for what results you’re shooting for, thus give you good input into how you should spend your time.

Create a subsection within “Experiments” for each of the classes of experiments. There are often at least:

Motivation - why are current systems insufficient? (perhaps this is redundant with your previous work on the work’s motivations.) Please start out with the motivation asking the question, “what do existing systems lack, or what is wrong with existing systems?”, and from there create experiments that fairly demonstrate those shortcomings. First and foremost, you’re trying answer the question “what value does your system add”, and focus on results that fairly showcase that. Only after you establish value, do you look at experiments to characterize all aspects of the system (where it is weak, where it is strong, workload characterizations, etc…).
Applications - what higher-level applications can benefit from your system? These are important as they show the “final” justification for your work – the “proof’s in the pudding”. If your system doesn’t work for applications, then it will be difficult for your audience to completely believe your work’s value.
Parameter studies - what are the parameters that impact the benefit or the properties of your system? For example, things like the working set, rate of arrivals, size of requests, memory allocations, etc… These are often the easiest to figure out as they are very much intertwined with the technical details you’ve spent months working on!
Microbenchmarks - these study the atomic costs of the system with the intent of giving the reader an understanding of the underlying bounds of the system and compare, where possible, against comparable operations on existing systems.

Please not that this is not the order this often appear in the paper. Often the most natural order in a paper is motivation, microbenchmarks, parameter studies, then applications. However, for the outline, I’d like you to focus on this order as it forces you to consider the most difficult, first².

Specific Experiments

Create a sub-sub-section (that’s three “###”s) within each of the types of experiments, for each experiment. The first thing you should do is understand and write down the question you’re trying to answer with the experiment. For example:

Question: What is the maximum interference on high-priority tasks that systems can experience from IPIs?

Only after you clearly identify what the experiment is answering, should you then list the following:

System setups - which systems are you comparing, how are they set up?
Workloads - what is the workload on each system?
Graph/Table details - the axis and parameters of the graphs/tables. This is the hardest part, and will take a lot of thought. The dominant question is how to best represent the phenomenon you’re studying, and how to best answer the question (from above).

If you have trouble putting any of this together, get your advisor involved, and talk to your peers! You’ll iterate on this document quite a bit, so don’t get too attached to anything. The story of a paper can change quite a bit as you get results, but you have to have an initial plan, and a goal.

The Headline Result

The headline result is a single graph that, alone, both shows why other systems are insufficient, and why your system is necessary. From another perspective, if you gave a presentation of your work to a company, which result would you show? This is something that I went for over ten years in ignorance, not knowing how important this was. The contrast to having a headline result is to have to go through many of your results to convince people of the contribution. For scientific completeness, we require a thorough evaluation, but if someone trusted that you did your research in an intellectually honest way, the headline result would almost be sufficient.

So the last thing to consider is which of these results is your headline? If it is hard to identify it, it is worth thinking about if they can be reconfigured to bring one out.

Collaboration and First Draft

After a sufficient number of rounds of discussion and revision with your collaborators, it is time to make the real paper’s outline. Keep the outline.md around, and don’t update it with changes in the real paper. This will be useful for a post-mortem for yourself (i.e. what did you do wrong in the outline phase?). Your initial outline should include sections, subsections, figures (often not populated at this point, but with descriptions)³. At this point, you should be very intentional with your collaborators, and come up with a plan for who is going to work on what, when. The goal should be to get a first draft of the paper done a couple of weeks before the deadline. This will give some time to send the paper out to friendly researchers for feedback, and time to iterate⁴.

Figures

The text is often wrapped around, and driven by the figures, especially in the design sections. Thus, it is important to plan the figures before diving into the text.

Headline figures:

Motivational figures. What figures can capture at a high-level the difference between your system and existing systems? This might a number of shapes: a graph with N dimensions, showing points for different systems (e.g. scalability, consistency models, isolation modes) (see the BI paper); a diagram showing different system organizations, and implying different isolation and performance characteristics (see the TCap paper); or a “context” diagram showing how the systems fit into a broader context (the system in hardware, the system in an IoT environment, etc…). These are hard, and very important. They will capture the audience very early on in the introduction, and help transition into the system design. Note that some papers don’t have a natural motivational figure, but put effort into coming up with one.
High-level design figures. If you had to give a lecture on your system, and you had one figure to use to guide the discussion, what would it be? Is there a way to summarize the design of the system in a single figure? When reading through papers, ask yourself which are the headline figures. They often aren’t hard to find.

Aside from these, you will likely require figures to explain different subsystems, and experimental setups. Take some time to think this through, and what additional figures are necessary. Add them into the outline (with at least a description).

Paper Organization Options

There are nearly endless options for how to organize a paper. There are a few high-level things you want to focus on:

Separate out the story of your work from the system design.
Separate the description of the high-level design of your work, from the implementation details.
Figure out how to properly justify the work (story), and integrate the motivation.

Paper organization should be done intentionally to manage the intellectual dependencies inherent in your research. These dependencies capture the ideas that are necessary to introduce a new idea. You can’t explain a new IPC mechanism without a previous introduction to protection domains, or to the architectures and shortcomings of existing approaches. You can’t describe a new RTOS without introducing the concepts of predictability and interrupt management. At a finer granularity, you can’t introduce TCaps without first introducing user-level scheduling, interrupt abstraction, reservations, and IPC⁵. The coarse-grained organization of a paper gives some indication of how these dependencies are managed.

Here are three organizations that we’ve used that all differ on the last point:

Conventional: This is what I consider a more “classical” organization that relies on a strong intro to motivate the work. The design section/s is/are separated from implementation to explicitly manage dependencies. In design, you introduce the high-level ideas behind your system, while the implementation includes details that often require an understanding of the high-level architecture. The related work is after all of the paper’s details, so you can leverage the full intellectual description of your system, and of the results in relating your work to other’s.

Introduction
Design
Implementation
Evaluation
Related Work
Conclusions

Note that in all options, Design and Implementation might have more research-specific names, and likely have many subsections and figures.

Motivation Section: This organization has an explicit Motivation section and might be appropriate if you have strong results that can go into the motivation to justify the need for your system before you’ve introduced the system. Note that the motivation section has very few intellectual dependencies it can rely on, so it has to introduce the motivation for your system, without any detailed descriptions of your system. This means that the motivation is often primarily based on quantitative properties of existing systems.

Introduction
Motivation
Design
Implementation
Evaluation
Related Work
Conclusions

Related Work as Motivation: The related work section can be placed early in the paper where each previous system can be used as a foil to introduce high-level aspects of your system. From my perspective, this is the most difficult to write, generally, but some people do it very well. This is because the related work has to place your work with respect to other’s while relying only on the high-level description of your work from the introduction.

Introduction
Related Work
Design
Implementation
Evaluation
Conclusions

I’m using monolithic bullets for each of these sections. Each of them often requires an intentional break-down into smaller constituent parts. A few examples follow:

Notes on Design. The design section of a paper is very important as it lays out the abstractions and means for the reader to think about your work. I find it helpful to think about this section as constituting at least two parts: 1. the design goals that enable your design to achieve your contributions, and 2. your design that achieves those goals.

Notes on Evaluation. Your evaluation must include an analysis of the results. You want to cleanly decouple a description of the evaluation setup and what the graphs show, and how that should be interpreted. In the past, we’ve done this by adding a Discussion. heading for a paragraph after the experiment’s description. I’ve read many successful papers that have added subsections that summarize key take-aways from results as well. Regardless, make sure that you intentionally separate description from interpretation.

Writing the Paper

I’m not going to go into this now, but you can find some links below that are decent. I’ll be involved in this process, so never hesitate to ping me if you have doubts or concerns.

Throw This All Away

Once you feel comfortable navigating the difficult territory of paper construction, then you can throw away a lot of this advice, and do what’s comfortable. However, you still need to keep your collaborators in the loop, and organize yourself long before the deadline for the paper. This is how I organize my thought process, so it is unlikely it will perfectly match what is best for anyone else. However, I’ve found that new students need guidance, and to understand a model for doing this.

Resources

I’ve provided copies in the library area the following books. Please make them available to others if you aren’t actively reading them.

The Elements of Style, Fourth Edition by Strunk Jr., William. This is a classic book, and still provides good advice on how to maintain simplicity of writing, while effectively maintaining your point.
Edit Yourself: A Manual for Everyone Who Works with Words by Bruce Ross-Larson A more detailed treatment of how to manage your own writing. How can you simplify sentences, manage their complexity, and be careful about their grammar.
A Writer’s Guide to Transitional Words and Expressions by Pellegrino, Victor C. This books is especially useful if English is not your first language. How do you move your thoughts from one topic to another, while providing proper transitions? What are some of these transitions, and what is “good taste” around using them?

I highly suggest you buy a copy of each of these.

Some advice on systems paper construction:

Updates

August 7, 2018 - Updated the Paper Organization Options section to 1. fix a typo, and 2. add more information about managing paper dependencies.

If you don’t know markdown, please learn it. It will take about 2 minutes. Please don’t get fancy with it. It is meant to be read in rendered form, so use each feature in markdown as it is meant to be used. You can use pandoc to render, but I often prefer simply directly using github.↩
Microbenchmarks and parameter studies are much easier to come up with and understand when you’re in the midst of system implementation.↩
See the skeleton repo, and the style in the publications/ repo.↩
You need to step away from the paper, then come back to it a couple days later with an editing eye, if you want to refine it.↩
The long list of dependencies that TCaps have is one of the challenges of understanding and using them, and made writing the paper very difficult.↩