CSCI 3411 - Operating Systems

Posted on August 1, 2019 by Gabriel Parmer

This course covers the fundamental concepts of operating systems, focusing on resource management and abstraction. This includes OS structure, processes and thread management, communication with peripherals (I/O), synchronization, deadlocks, memory management, Virtual Machines, cloud infrastructures, and abstractions for cloud computation. The workload for this class is heavy and programming intensive.

Course Info

Instructors Contact info:

Fall 2018 Class Time/Location:

Office Hours:

TODO

  1. Register for the class on Piazza. I’m opening this early, over the summer so that you can ask any questions you may have in preparation for the class.
  2. Once the class starts, on Piazza you’ll find a link to the course survey. You cannot receive any grades until you fill this out; please do so immediately.
  3. On Piazza, you should also find a link to the first homework (HW0) on github (which requires you to make/have a github account).

Course Overview

Learning Outcomes

Objectives - In completing this class, students will…

Structure - This class is broken into two main activities: lectures and lab.

Direct Instruction and Independent Learning

Each semester, you’re expected to spend at least:

Course Prerequisites and Student Responsibilities

Prerequisites:

Responsibilities - Students must

Professor and Instructional Staff Responsibilities

You should depend on the instructional staff (including Gabe) for the class to

These can be remembered as “we’re here to make it clear what you should do, what you should know, and what to do if you don’t feel like you know what you’re supposed to do or know”. If at any point, you have concerns about any of these, of if we’re dropping the ball on any of them, please let us know and we’ll do better. There are trade-offs made in the class (for example, homeworks are not formally specified as I don’t want them to be 50 pages each), but you always encouraged to ask questions and clarify.

Above all, it is important that this class provides an environment for learning and self improvement. Gabe takes this very seriously, so if it is not adequately serving that purpose, don’t hesitate to let him know why.

Experiments

Part of keeping a course “fresh” and successful is trying new things, and keeping the things that work. Given this, the experiments I’m running this year include:

Online office hours. I really want to make office hours work this year. Historically a very small minority of the class attends office hours. There are many reasons for this, but two of them are inconvenience/timing and the fact that office hours can be intimidating. Thus we’re going to try a few things:

Organized class clarifications. I’d like to encourage a strong feedback loop for answering questions and concerns about the material in lecture and lab. The instructional staff make mistakes and might not convey everything perfectly, and everyone at some point in the class will get confused by the material. You must take the student responsibility (above) seriously that requires you to ask questions in class if something isn’t clear. However, some of these confusions only surface after you have some time to process/absorb the material.

Thus, after each class, we will post a specific Piazza note asking for areas in which you don’t feel confident. If I forget to post this, please do so with the title “Class clarifications for y xx/xx” where y is the day of the week (Tuesday or Thursday), and xx/xx is the month/day, and use the Piazza topic lecture_clarifications. I will look at these topics, and try and cover them as much as possible at the start of the next class. This is your opportunity to tell me what needs more coverage in the class.

In class, I’ll be calling on random students. I’m sorry.

This class requires active discussion, and for a broad cross-section of the class must be involved. Historically fewer than 10 students have fielded the majority of the questions, which isn’t a healthy dynamic, nor does it lead to a discussion that will help the majority. I completely understand that answering questions can be very stressful, and the constant possibility of being called on next can keep you on your toes. Know a few things:

So I completely sympathize that answering in-class questions can be quite challenging. Because of this, my expectations are not that you correctly answer every question. Often the most intelligent way to answer a question is with another question. So I expect that if you cannot take a shot at a question, you try and identify where the confusion is and point me in the right direction to better explain the material. However, please stay alert, and do your best.

Everyone gets a single day that they are guaranteed not to be called on. Though attendance is mandatory, if you know you aren’t going to be able pay attention (personal complications, multiple exams, no sleep the night before, you know, college 😉), then you can get a pass. You must fill out the form at least 15 minutes before class, and can only fill it out once. Double check that you’re choosing the correct day.

Course Schedule

Topic xv6 Book Chapters/Additional Reading1 Silberschatz Chapters2
OS Fundamentals: Hardware and Structure 0, 1, 3 and this post 1, 2
Processes, IPC, Threads, Isolation this post 3, 4
Scheduling and Critical Sections 5, 4 5, 6, 19
Mutexes, Condition Vars, Deadlock N/A 6, 7
Midterm all of the things…
Memory Management and Allocation N/A 8
Memory Protection 2 9
File Systems and Storage 6 10, 11, 12, 13
Security N/A 15
Final all of the things…

You can find the lectures on Piazza after they are delivered. By default, I’ll provide these after they are fully covered in class. I’ve found that people read ahead which can harm the learning processes (as quite a bit of information is not on the slides) as it removes the problem solving processes. If this policy hinders your learning process, talk to me and we can discuss options. For your reference, all of the lecture’s pdfs are on the following topics:

Course Material

Strongly encouraged (but optional) Text:

If you have trouble understanding the xv6 source code, I recommend the following resources:

You can fork the xv6 source on github. When you do, test that you can run it with make, and make qemu.

In the end, you have to get comfortable walking through the code, and understanding it. Using a code indexing system will make this easier. ggtags or ctags for emacs does great, but there are corresponding technologies for nearly all editors.

If you’re having trouble with C, here is a list of references:

It is your responsibility to quickly get up to speed in C, so please use these resources.

Assignment Submission

All of your homeworks for this class will be submitted via github. The last commit that you make to the repo will be graded, and the lateness penalties of that commit (see “Late Policy” below) will apply. Please do not make pushes to your repo after the version you want graded. Please also not that it is necessary to issue a git pull up to the github repo for submission. Local commits to your own repo do not update github, thus cannot be counted as submission.

Grading

Grades will be assigned with the following proportions:

Homeworks include smaller programming assignment, written responses, and a long final programming project.

Participation will be graded not only on attendance, but also on interactivity and involvement during class and lab.

If needs be, there will be short quizzes at the beginning of classes. These will only happen if people make it a habit of not coming to class, or coming late.

Homework Grade Modifications:

The following modifies apply to your grade on all homeworks (but not the project):

Your grade can be minimum 0%, but your maximum grade can be higher than 100%. There is one exception:

I cannot emphasize this point enough: The only way people have failed this class is by not trying or by cheating.

Late Policy:

The only acceptable excuses for late submissions are medical, family, and related issues. For these, you will require a note (from a doctor, parent, etc…) to justify an extension. Unacceptable excuses include:

You get 0 credit for late work that is unexcused. However, once the grades are released, you can resubmit your code within seven days, rounded up to the closest midnight. For example, if the grades are released at 2:30pm on the 8th, resubmissions are due at midnight on the 15th (i.e. \(8 + 7 = 15\)). You will get 0 credit for submissions after this date.

We will grade the last commit before the resubmission deadline, and you will receive 50% of the credit on top of the original grade, rounded up to the nearest percent. For example, if you got a 43% on the initial deadline, and resubmit and get a 68%, your final grade (\(g\)) will be \(g = \lceil 43 + \frac{(68 - 43)}{2} \rceil = 56\).

Do not come to rely on the resubmission process. Homeworks are due every week or every other week, so it is very harmful to get behind on them, or to add more work to your schedule. Thus, our late policy is quite strict to discourage procrastination.

Please note that in terms of academic honesty, you cannot share your code with your peers even after it is due.

Testing your implementation:

When you are required to test your code, please note that our test cases will best every edge case in the implementation. If you do not write code to test the edge-cases of the specifications, then you will lose credit. This is the largest reason why students get lower grades than they believe they deserve. If you believe that an error was made in grading, then please explicitly address why you believe your test cases are sufficient to test all cases. See the discussion about testing and debugging below.

Style and Code Craftsmanship:

Any homework graded by demo (i.e. the project) must adhere to style guidelines, and you must curate your code. It must be simple, readable, and clean. See this post For some thoughts on how to make your code more readable, see the Composite Style Guide.

Academic Honesty

Just as you can do a google search for code online, it is trivial for us to do the same. We have caught numerous people cheating in the past in this way. If you feel pressured about an assignment, please come see me instead of cheating.

You are not allowed to collaborate on the homeworks and the lab assignments. The group projects require collaboration within each group, but no collaboration between teams is permitted. Please refer to the academic integrity policy linked from the course web page. This policy will be strictly enforced. If you’re having significant trouble with an assignment, please contact me. If you have not signed the academic integrity statement for homework 0, you will not receive a passing grade for the class.

Summary: I want to be very clear:

In the past, the only way that people have failed this class is to either not try (not do homeworks), or to cheat. There is almost no reason to cheat. Come to me, and we can discuss where you’re at, and what you have to do to make it through this difficult class!

Justification: Cheating hurts your ability to compete for CS jobs. Everyone came to college to learn. You learn by doing work to reinforce class ideas. Understanding ideas from class is nowhere near as effective at learning as applying those ideas in an implementation. Thus, you need to do your own implementation to be a competitive CS student in the real-world!

Cheating hurts your peers. Courses that are graded on a curve have the side-effect that if someone inflates their grade not by hard work, but by cheating, your good grade (should the Professor not catch you) will hurt the grade of all of your colleagues that put the work in.

Cheating also hurts the quality of the OS class in the long term, thus hurts all future generations of GW CS undergrads. If you cheat, especially using previous year student’s work, to the point where the Professor believes that they must develop new assignments every year, then those assignments will not be as strong. They will not meet learning objectives in a course like this. All future CS students will be less prepared for competing for the best jobs.

Attending Office Hours

Please come see Gabe to address any of your concerns. You should not wait until a problem comes to a head before meeting. Gabe can help you find a path to avoid bigger problems. Of course, the main use for office hours is to discuss the homeworks, or help prepare for exams.

If you have a particularly horrible debugging problem, then Gabe might be your best bet. Regardless, who you talk to, if you want them to look at code then you must adhere to style conventions. See style conventions below. At the most basic level, you must

If you don’t demonstrate to us that you actively tried to understand your own code by making it as easy to understand as possible, then it isn’t reasonable for us to dive into your code. Note that you really should take all of these steps when developing, debugging, and testing your own code, regardless. In the “real world” it is expected that you generate code that adheres to these guidelines.

For online office hours, you don’t need to prepare at all. Show up, and see what happens. If there aren’t questions, we’ll talk about the material, technical skills, or do code walk throughs.

Additional Material

xv6

To run xv6, you need qemu installed. If you aren’t running Linux, then you might execute Linux in a virtual machine (e.g. using Virtual Box or VMWare Workstation), and qemu within Linux. I believe others have gotten qemu executing in OSX as well (but it takes installing a cross-compiled ELF-version of gcc), so discuss on Piazza if you’re interested in that.

$ git clone https://github.com/gparmer/gwu-xv6.git xv6
$ cd xv6
$ make qemu

Simple!

Note that the xv6 book has a great overview of the design of the entire system. Please use it as a resource if you need it. Note that since you don’t own this repo, you cannot git push to it. You have to fork it on github to become the owner of your own fork. For this class, all assignments will be completed on repos that we provide for you.

Some hints for understanding the code-base. All code that runs in user-level is in the files that correspond to the entries of UPROGS, and all of the header files that they include. The rest of the code executes in the kernel. Thus to add a new program, you only need to provide the .c file, and add it in a similar manner to UPROGS.

When you’re looking at a .c or .h file, always make sure you understand if it is executing in user-space (thus must make a system call to enter the kernel), or is in the kernel (thus can directly call functions within the kernel). Using a code indexing system will make it easier to walk through the source code. ggtags or ctags for emacs does great, but there are corresponding technologies for nearly all editors.

git and github

Using github. github provides a wonderful set of repository management features online. You can go through the github bootcamp if you don’t know how to use it.

You can use github to coordinate between team members in many different ways. I’d strongly recommend that when working in a team, you use the feature branch workflow pattern of interaction. At the highest-level, this means that each team member will be working on their features on separate branches, and will integrate them into the mainline to coordinate with everyone else.

A few of the ways to interact with github:

Using git. I’m not going to do a deep dive into using git here, and expect that you can figure out the main features online. Some quick git tips can be found at the bottom of software engineering guide. An incomplete list of commands that you can use with git include:

Submitting assignments. When you submit an assignment, I suggest that you follow this procedure:

  1. git status to make sure that you have git added all of the intended files.
  2. git diff and git diff --stat to make sure that you’re committing what you think you are. Of note, check to see if you’re committing any printfs that are intended for debugging, not for program output.
  3. git commit and git push to upload your code to github.
  4. Go to the associated webpage for the repository, and make sure that the most recent commit is present in your repository.
  5. Use git clone ... to get a fresh version of the repository, build it, and run your tests. Students who omit this step get bad grades for very avoidable reasons. Always make sure that your submission matches what you think you’re submitting.

Style

Please document in your repository (e.g. in the README.md) which coding style you’re using. By default, the xv6 repo we use is formatted in accordance with the Composite Style Guide. See that description to understand why style is important, and why we choose the style we do.

It is important, when you’re working on a team to use a consistent style. It is more important to choose one and stick with it, than to choose the “perfect style”. You can find many different styles online, including

Though not relevant for this class, many modern languages have prescribed styles such as go, and Rust.

Style requirements for this class. For the purposes of this class, never bring code to the instructional staff that isn’t properly indented. That is the minimum bar that humans need to be able to understand code. If you bring code that isn’t indented, then we will think that you haven’t put a sincere effort into understanding your code as you haven’t invested care into making it understandable.

Debugging

Debugging is a skill. Just like any other skill, practicing doing it well will make you much better at it. Most students are so rushed by deadlines (often due to procrastination) that they focus only on “making it work” rather than on refining their debugging skills to become better computer scientists. Finding a fix to any given issue is important, but you must focus on developing your long-term skills, or else you’ll find your “ceiling” of capability much sooner than you’d like.

To deliberately and continuously improve your debugging skills, I implore you to focus on the following:

They are each briefly discussed below.

List and Check Your Assumptions

When something goes wrong with your code, it is because your mental model of how everything should fit together does not match the instructions you’re providing the computer. Your mental model is build of many assumptions about what your code is doing at any point in time. One of the most important learned skills is recognizing when you’re making an assumption, and identifying what the assumptions are. Examples of these assumptions include:

Most your debugging time should be going to coming up with creative ways to check your assumptions. When something goes wrong, (at least) one of your assumptions is wrong. Your job is to figure out which one(s). This changes the emphasis of debugging from “why so wrong!?!?” to “which specific part of my understanding of this is incorrect?”. This is how you change debugging from what feels like a random process, to something that is methodical, deliberate, and, most importantly, productive.

It is very common for people to get stuck with no obvious path forward. The program is simply not doing what they think it should, and it makes no sense! If you’re at this point, then you’ve already made too many assumptions (at least one of them being wrong), and have built your entire mental model upon them. The solution is to go back, and start testing not your code, but the assumptions made throughout the lines of your code. Where is your understanding of your code going wrong?

Scientific Method

The scientific method is based around formulation hypothesis about how some phenomenon will behave, designing tests for checking the hypothesis, running the experiments/tests, and either confirming the hypothesis, or starting the entire process again. An important aspect of this process is that it is focused on knowledge creation. Deriving a hypothesis takes a deliberate process that focuses you to apply your brain, and think about the workings of the entire system. If it is confirmed, you often know not only that the hypothesis is true, but also why. If the hypothesis is wrong, you learn that there is a bug in its formulation, and you might go back to more foundational tests to figure out how and why it is wrong. This enables you to understand the system better, and made a better hypothesis next time around.

You should treat your assumptions about your own code with the same care. First, acknowledge that they are assumptions – never just assume that a part of your code works without testing – and then treat the assumptions as hypothesis about the code that need to be tested. Most importantly treat your assumptions as something that you refine through intentional testing. It is through this process that you learn about how your code works.

Planning your Approach

When you sit down with the goal of solving some programming task, it is natural to try and jump in and start developing. This is a fine approach if the problem that you’re trying to solve is below some complexity threshold. If it is a very comfortable task for you, then just “diving in” can be a perfectly viable approach. As you get more and more skilled as a programmer, this threshold will increase. However, as you get more and more skilled, you’re responsibilities will likely increase faster than this threshold. Thus, it is necessary to learn to be productive on complex projects that you cannot easily just “sit down and implement”.

It is necessary to do work that is beyond this threshold for your own personal development. Much of the work in this class is valuable because it is, by design, too complex to implement productively without a plan. This forces you to practice the act of planning out your approach. I help with this in the homeworks by providing a set of “levels” that need to be completed. You must complete the levels one at a time as they provide a path to incrementally add complexity only after you’ve learned about the problem from the previous levels.

Your development plan is simply a sequence of implementation goals that you implement, and test one at a time. You don’t need to have a complete plan mapping from the start of the implementation to the end, but you do want the first N steps so that you have a plan. This allows you to implement you code in a way that has a little bit more of a long-term design behind it.

Rewriting Code

When we write code, we often don’t understand the problem domain well enough to provide an ideal implementation. The effect of this is that our code is far too complex. As we learn about different aspects of the problem domain – “ohhh, I have to worry about i > N as well”, “yikes, what if the head is NULL?”, etc.. – we add more and more conditions and loops. Keep this in mind: at any point in your code look at the level of indentation; that is a good proxy for the number of things you need to keep in your head to understand a given line of code. Each condition and loop encodes variables and changes that you need to remember to understand the code in their bodies. Given that human short-term memory holds only 7$$2, code simply becomes too complex to keep in your head.

It is necessary to revisit and rewrite your code when it becomes to complex. Debugging code that is too difficult to understand and too challenging to even formulate assumptions about will take orders of magnitude more time to debug than to rewrite your code with a simpler structure and debug that. I very frequently find that I cannot make sense about how to modify my code to achieve some goal, and the only anecdote that I know is rewriting the code to constrain and abstract away complexity. You’d be amazing how much a few well-named functions containing well-abstracted code can make your code easier to understand. Additionally, the act of trying to simplify your code makes your code easier to debug and forces you to understand the fundamental problems, which also makes debugging (and assumption formulation) easier. In short, real-world programming involves a lot of code rewriting. For much more information about how to write C code with an eye toward constraining the complexity, see the composite style guide.

More Resources

Testing

Untested code must be assumed wrong, simply put. You can never make the assumption that any part of your code works that you haven’t tested, no matter how simple. “It should work” is something that the instructional staff will never believe, and should be an indication of hard debugging ahead if you find yourself uttering it. “It should work” is a huge assumption with a builtin recognition that you aren’t going to test the assumption. Put another way, it is a recognition that you’re about to waste a lot of your own time. So I want to make it clear that you must test all parts of your own code. If your grade is lower because a branch doesn’t quite do what it is supposed to, “but it is just a small error” not only is completely incorrect, but demonstrates a complete lack of understanding of the fundamental importance and purpose of testing.

The job of a library, an operating system, or a piece of utility code is to make the programmer able to perform some operation, or to make the operation easier. In such situations, you have to assume that programmers don’t know what they are doing, and that, to your greatest ability, you’ll catch their errors and return a useful indication of where they went wrong. In the case of an OS, you have to check that all parameters/inputs are valid as if they are not, they might cause a bug that could crash the entire system, or allow the user to take over the system. This is accomplished in a similar way in every language, on every platform: write the code to explicitly check the validity of arguments, write the code to check for edge cases, and return the proper result. In the case (as in this class) where you’re given the specification for each operation, you simply need to ensure that you encode all of these erroneous inputs into conditions, and return the proper value.

Next, since code that isn’t tested doesn’t work, you have to write tests to make sure that all inputs result in the correct output. Properly testing your code is simple: codify your assumptions about how your implementation behaves into code. This absolutely means that you’ll spend time writing your testing code to generate all the edge cases, but this is absolutely necessary.

Most submissions that lose points, lose them because:

  1. Conditions aren’t there for specified inputs.
  2. Conditions are there, but aren’t properly tested, thus produce incorrect behavior.
  3. Implementations are incomplete, or have known bugs.

There is almost no reason to lose points for 1. and 2. Testing is an essential part of being a developer. Thus, the instructional staff will not be lenient if you don’t do proper testing.


  1. Find the book online

  2. Silberschatz chapters are specified with respect to the 9th edition