f CS 6907, Fall 2016: Special Topics in Operating Systems

CS 6907, Fall 2016: Special Topics in Operating Systems

Professor: Gabriel Parmer (gparmer at gwu)
Research interests: Embedded systems, component-based OSes (see Composite), multi-core OSes

Class Schedule: Tuesday 6:10-8:45
Office Hours: TBD

This course covers the fundamental concepts of operating systems, focusing on resource management and abstraction. This includes OS structure, processes and thread management, communication with peripherals (I/O), synchronization, deadlocks, memory management, virtual machines, and embedded and real-time systems.

The course focuses on cutting edge research, and many of the classes will be devoted to presenting, discussing, and investigating papers recently published in some of the top conferences in Operating Systems.

The workload for this class is heavy and programming intensive.

News:
  • First day of class. Your TODO list:
    1. Sign up for the Piazza message board.
    2. Do you remember/know C?
      If you don't, you are far behind in this class. You must work hard in the first weeks to get up to speed.
      For example: spend a few minutes writing a linked list that supports:
      void add(struct linked_list *ll, void *value),
      int length(struct linked_list *ll),
      void *remove_first(struct linked_list *ll), and
      int contains(struct linked_list *ll, void *value).

      If you have any problems with this (i.e. this should be trivial, you need to take action now. See the Essential C pdfs in the Course Material section below (esp. the practice problems).

      If you don't become comfortable with C soon, you will not do well in this class. You will profit later from having a firm grasp on C!
    3. Consider installing VMWare player on your personal machine. We will use virtual machines for all development in this class.
    4. Form a group of up to 2 members for your semester-long project.
    5. Read about the courses main topics (see Schedule below), and when they are posted, choose a set of papers you want to read, and be prepared to post them on a spreadsheet to be provided by the Professor soon.
    6. See the
    7. homework due on Sept 14th
    8. in the Schedule section!
    9. Gear up for a challenging, but satisfying semester! Get ready to demystify how computers work!

Please click on any section below to see its contents.

Objectives and Structure

Objectives - In completing this class, students will...
  • understand key concepts involving system resource management, organization, and abstraction
  • understand how an OS manages and interfaces with hardware
  • understand fundamental trade-offs integral to system design
  • experience both development and experimentation in a real OS
  • understand how to read a research paper
  • understand how to write a review for a research paper
  • understand how to give a research presentation, and to discuss the presentation of research

Course Prerequisites and Student Responsibilities

Prerequisites:
  • Computer Architecture or Operating Systems
  • Assignments for this class will be done in C, and will require kernel-level programming. Though we do not assume that you have kernel experience, if you aren't somewhat comfortable with C, you will have a very difficult time. Thus, a familiarity with C or a willingness to quickly learn and practice it is required.

    If you are not comfortable with C, at least go through Essential C and the exercises in the Course Material Section.

Responsibilities - Students must
  1. Attend all classes unless you are sick or there is an emergency, in which case you must email the Professor.
  2. Interact, ask questions, and generally participate in class discussions.
  3. Complete programming problems assigned.
  4. Devote significant amounts of time to a large implementation project on low-level systems that will require a significant amount of debugging. You cannot procrastinate on this.
  5. Give presentations throughout the class on research papers. You must prepare thoroughly for this. See the item below on how to present research.
  6. Be proactive in learning on your own. You will encounter many ideas you are unaccustomed to, or unaware of. You must do your own research online to understand the concepts well enough to explain and discuss the research papers.
  7. Students must write quality code. You must read Composite Style Guide to understand one perspective on this.

The work for this course includes:

  • An initial programming assignment to give you a taste of what's required for the semester project.
  • Weekly, you will either 1) present research papers, or 2) read two of the research papers being presented on that day. For the former, it is your responsibility to email me a draft version of your presentation 72 hours previous to your presentation (Saturday at 6pm), and be available to correct any mistakes and integrate my feedback. Send your email with the subject exactly matching CSCI6907:Presentation Draft. If you do not send this 72 hours early, or don't name the email correctly (I use email filters to view these emails), your grade for the presentation will have 20% removed from it. Your job is to best present the main topics of your chosen paper(s). See the tips below for how to present research papers. For the latter (reading papers), it is your responsibility to send me an email by 5pm before the class with your paper summaries. They must be included in the body of the email and cannot appear as attachments. You must exactly name your email CSCI6907:Paper Review. Again, your grade will suffer if these guidelines are not followed.
  • You will work in a group with a maximum size of 2 members on a semester-long project. I will circulate a list of project ideas soon. You are free to choose one of them, or create your own, pending my approval. If you create you're own, you must discuss the idea with me via email before the selection deadline. The project will include an interim presentation and report half way through the semester in which your group presents their progress, a final presentation and report at the end of the semester that presents the final state of your project. You will, lastly, hand in your code with documentation and a README file describing how to build and test the code. This project will be the largest component of your grade.

Reading Research Papers

Reading and understanding research papers is a skill that you will develop throughout the course. The most important concept to understand when you're done reading a paper is "what was the purpose of the research"? In answering this, you will define the contributions of the paper. You should have a general idea of what the contributions are after reading the abstract and introduction, but keep them in mind throughout the entire paper, as they provide motivation for how the system is implemented, and how it is evaluated (which tests are done). Often you will change your understanding of the contributions as you learn more about the techniques used, and the previous related work. An analysis of the paper is an important aspect of understanding a piece of research. It forces you to critically think about the research. When analyzing a paper, you want to answer what you liked about the research and the techniques, and what are its limitations? These limitations often revolve around assumptions the approach makes, the generality (or lack there-of) of the approach, etc... Each piece of research has a limited applicability, has limitations to its technical approaches, and has limits to its motivation. An important piece of paper analysis is to identify these as they define possible future research. As you read through a paper, I highly recommend that you take notes (I often write in the side-bars of the paper). Lastly, you should note any questions that you have about the paper. These can be about topics ranging all the way from technical aspects of the paper, all the way up to high-level questions about the motivation of the paper. Anything you don't understand, should go into this section.

The summary must include three short sections (title them in your email with the corresponding titles below):

  1. Contributions. A 1 to 3 sentence summary of the purpose of the paper (i.e. what is its contribution?).
  2. Analysis. A paragraph or two including an analysis of the paper.
  3. Questions. A list of questions you have about the paper that is unlimited in length. You are not expected to understand 100% of each paper, and the questions you may have can be part of the class discussion.

These summaries should not be long, so please try and be concise. In fact, the more concise they are, the better. The contents of the analysis and questions should often follow directly from notes you made while reading the paper. You should bring a copy of each paper to class.

Presenting Research Papers

Making presentations for research papers can be difficult, but if you keep a few things in mind it should organize your approach. First, you should aim for a presentation that is about 30 minutes without interruption. You must practice your presentation and time it. Second, it is important to convey in your presentation the motivation for the approach taken in the paper. Next, you will answer the question, "what is the problem being addressed by the paper?" This might include how other systems have insufficiently approached the same problem, how some specific application scenarios necessitate the research, etc... After the motivation, the implementation details should be presented. You should not include all details here. Instead, you should focus on the most important details essential for understanding the empirical evaluation. If minute details are important, we can look at the paper directly. Last, you want to discuss how the authors justify their system through empirical evaluation. What tests do they do, why, and what are the results. You will not have enough time to go through all of the results so choose representative results that reinforce the main contributions of the paper.

Often presentations already exist online for some papers (especially the more recent ones). You can feel free to use content from these presentations (I encourage it) but you must include attributions and credits appropriately. If you do not do this, you are plagiarizing (see Academic Honesty below).

General advice on giving good presentations and writing good papers can be found here.

Course Material

If you're behind and having trouble with C, here is a list of references. You must go through these on your own, and very early in the semester.

It is your responsibility to get up to speed in C, so please use these resources.

Once you are more advanced in C, it will be useful to read the style guide for Composite, our research OS here at GWU.

Schedule

The course is structured around discussions on persistency, parallelism, and consistency; three very related topics. More information to follow.

Homework 1 due Sept 14th, 2016, midnight: For this homework, you will implement a slab allocator. Find the original paper here. This paper is not very clear, and you will want to seek out other resources to learn more about slab allocators. Remember that you cannot copy other's code, and if you get ideas from other code, you must cite it accordingly. However, you must still implement the API and general design introduced in this paper. Through this homework, you will learn:

  • The importance of slowly writing and thinking about code.
  • The utility of assertion statements.
  • The importance of paying attention to performance in code design.
  • How to write systems code in C, if you didn't already have experience with it.

Please see the academic honest section below. You are not allowed to copy other's code either from classmates or from the Internet. All of the code must be your own. For this assignment, you are not allowed to work in groups. You must implement this in C, and provide your own thorough test cases. You must use proper structuring of your project into *.c and *.h files. The quality of your code matters. Please focus on simplicity, and if a section of your code is becoming complex, do back up and simplify. That said, the fundamental operations of allocation and deallocation should not take more than around 100 cycles, so performance matters too.

Details:

  • Implement the API from Section 2.3 with the following exception: You don't need to implement constructors and destructors, so they don't need to appear in you cache allocation function.
  • kmem_cache_grow can simply use malloc, and kmem_cache_reap can use free (described in 3.1).
  • You are free to ignore what the paper calls "memory pressure" which is essentially the system asking for memory back. Instead, and unlike what section 3.4 suggests, you can free a slab immediately when no objects in it are allocated.
  • It is essential to understand what a freelist is, and how it is used in the slab allocator. For example, you should understand that in the common case where there are cached objects available, allocation and free should be O(1) operations.
  • You should only use malloc and free when allocating new slab, and not when you allocate or free a new object. To do this, you must understand that the freelists can be held within object memory itself, as, by definition, that object memory is unused if it is on a freelist.
  • You can ignore section 4.
  • It will help to read about malloc implementations, and understand how memory allocators are implemented in general. One reasonable implementation is dlmalloc, that is quite googleable.
  • A concise, fully functional implementation is possible in around 200 lines of code. If you are going much beyond that, then ask if you are making it more difficult than it needs to be.

Project and Group Selection -- Deadline Sept 21st, midnight: Please send me one email per group that includes 1) a list of group members (either one or two members permitted per team), and 2) a prioritized list of projects you're interested in. Your list must include all projects, properly prioritized. The subject line of the email must be exactly CSCI6907: Project Selection.

The suggested project all use our Composite component-based OS.

  • Satellite middleware support in a component-based OS. Lets get our software running on satellites! From here: "The Core Flight Executive is a portable, platform independent embedded system framework developed by NASA Goddard Space Flight Center. This framework is used as the basis for the flight software for satellite data systems and instruments, but can be used on other embedded systems."

    NASA has shown interest in running GWU's operating system on it satellites! Your job, should you choose to accept it, is to pair two operating systems with each other! On the one hand, we have CFE which is the flagship software that is used to run most of the applications on Goddard's satellites. On the other, we have our Composite that has fault tolerance properties (like micro-rebooting subsets of the system) that NASA doesn't currently have. Both of these systems are somewhat primitive and simple. For example, find a sample of much of the Composite's kernel functionality and API. CFE supports primitive operations such as event management, timing, messaging, and configuration management.

    More information about the core flight executive can be found:

    Please look through all of this information, and come up with a plan for how to approach this task, and send me an email with a list of questions you have. I can do a tutorial on how to execute and debug the system using QEmu (virtualization).

  • SQLite OS. SQLite is some seriously awesome software. A small, "simple" data-base that does great with concurrent read requests. There is some information about building SQLite in a custom OS.

    I strongly suggest that you first rely on SQLite for mutex and malloc support, and provide your own versions of VFS support as a library with a memory-backed "disk". Then progress to provide your own mutex implementation to enable concurrency/parallelism.

    We'd like a significant portion of the SQLite tests to pass.

    See the links above in the CFE for information about Composite.

Successful projects will hopefully be integrated into the main Composite repo! Please ask questions about each of these on Piazza. You want to assess which are the most interesting to you, so ask questions to help with that.

Paper schedule (TBD)

Please sign up for presentations using the link published on Piazza when it is posted. First come, first serve.

Grading

Grades will be assigned using the following breakdown:
  • 15%: Project - interim report and presentation
  • 20%: Project - final report and presentation
  • 20%: Project - code quality and functionality
  • 15%: Presentations - research paper presentation
  • 15%: Paper summaries - Sufficient, yet concise summary of research papers throughout the class
  • 15%: Participation - You must participate in class. You will get these points mainly for that participation, but also for participating on Piazza.

You cannot get credit for any component of the class if you hand it in late.

Academic Honesty

Just as you can do a google search for code online, it is trivial for us to do the same. We have caught numerous people cheating in the past in this way. If you feel pressured about an assignment, please come see me instead of cheating.

You are not allowed to collaborate on the homeworks and the lab assignments. The group projects require collaboration amidst each group, but no collaboration between teams is permitted. Clearly, you cannot share the contents of your paper summaries with each other, though you can discuss papers. Please refer to the academic integrity policy linked from the course web page. This policy will be strictly enforced. If you're having significant trouble with an assignment, please contact me.

Additional Material

If you're interested in Composite, the OS we are developing here at GW, don't hesitate to contact me.

In addition to the contents of the class, what follows is a list of resources that will allow you to go beyond what you've learned. This material is not mandatory for the class and is meant to give you a springboard if you wish to pursue the ideas further. Many of these are blog posts or articles for easy reading. However, because of the informal format, please take the contents with a grain of salt. I can point you to more thorough content if you want it. If you find an website/article/tool that you think is worthy of being in this list, let me know.

Websites geared toward systems topics (or that have subsections on systems):

  • Linux Weekly News: Free content is that which is at least a week old (see the Archives).
  • Two brilliant classes on microkernel construction.
  • Arstechnica: geek news, often with a high-level overview of systems/architectural topics.
  • OKLabs videos. A microkernel company. Specifically, see the series titled, "Advanced OS with Gernot Heiser".
  • OSDev Wiki: the resource on practical OS construction and hardware manipulation.
  • Agner's blog: What's happening in those processors of ours? See more architecture discussions at real world tech.
  • From low-level embedded system hacking, to robotics, to hacking! Hackaday!

Blogs and articles (remember that these are not authoritative sources!):

Really. cool. tools:

  • DynamoRIO: Because it rocks to modify a program as it's executing.
  • SQLite: a small, well written, data-base. Use it, read it, and be merry.
  • Lua: a tiny language with a runtime of about 10K LOC. If you want to know what a language does under the wraps, here's a good place to find out (just ignore some of the syntax oddities).
  • Valgrind: Ever wonder what your program's been doing in its spare time? This will tell you. Required for debugging large C/C++ programs.
  • LLVM: A beautiful compiler backend. If you want to write a language, this will make your life so much easier.
  • FUSE: Want to write a file-system, but aren't sure about all of that kernel programming? And what if you need to use high-level libraries? No problem; do it is user-space!
  • Daikon: Because who doesn't want to use machine learning to infer their own program's behavior?
  • More powerful link-time optimization!
  • A great C library for simple machine learning operations such as classification.