Topics in Modelling, Simulation and Optimization

Prisoner's Dilemma

In this assignment, we will have some fun with a classic game called the Prisoner's Dilemma. You will write strategies for this game. We will hold a class-wide competition and play off all strategies against each other in an iterated fashion to see which strategy performs best in the long run. In addition to the standard 2-person game, we will introduce a variation and play 3-person games as well.

Let's start by understanding how the game is played a single time between two players:

There are two players (prisoners). Let's call them A and B.

They have together robbed a bank and buried the loot in a location known only to both.

They get caught by the police. The police know that these are wily criminals and that they would concoct a mutually-corroborative story if the two prisoners are placed into the same room. So, they place each prisoner in a separate room and try to extract a betrayal from each prisoner: they try to get each prisoner to rat on the other. In the ideal case for the police, both prisoners would implicate the other and the case would be made for the police. If that didn't work out, if prisoner A implicated prisoner B, then at least they would have evidence against prisoner B (whom they could convict). Unfortunately, if neither sold out the other (that is, both stay silent), then the police would have to let them go. The prisoners know all of this.

So, what should a prisoner do? Prisoner A thinks: "If we both remain silent and don't betray, then we can both go free and share the loot. On the other hand, if I stay silent but B betrays, then I'm stuck in jail while B gets all the loot. If I betray and B is silent, then I get all the loot (hmmm!). Lastly, if we both betray, we are both going to be jail. Can I trust Prisoner B, who I know is as evil as I am?".

Clearly, it doesn't make sense to stay silent. Indeed, for a single-shot version of this game, it is best to opt for silence. This is called a Nash-equilibrium (for a single-shot game).

We are going to be interested in the Iterated Prisoner's Dilemma, in which such games are repeatedly played and in which one player can "learn" how the other reacts. The objective is to maximize winnings totalled across many games. How should such a player play? One can reason as follows: "If I only betray, and do this repeatedly, I'll never win anythingn if I play against players like me". So, it might make sense to occasionally remain silent.

We will do more than play an iterated game. We will play all players pairwise against each other multiple times. That is, A plays B N times; then A plays C N times; then B plays C N times, and so on. In each game played, each player will come away with zero or more in winnings. The player with the maximum winnings after all games is the overall champion. Your goal is to play to win overall.

Here's how we will describe the winnings for a particular game:

Consider a two player game between A and B. There are only two actions: "betray" (0) and "silent" (1). We will use the integer 0 to represent "betray" and 1 to represent "silent". Thus, in each game, each player's "move" will be either 0 or 1.

There are only four combinations:

         0 0       Both A and B betray.
         0 1       A betrays, B stays silent.
         1 0       A is silent, B betrays.
         1 1       Both are silent.

Each combination results in some winnings for A and some for B. For example, suppose the total loot was $4 (million).

         0 0       Both get $0
         0 1       A gets $4, B gets $0
         1 0       A gets $0, B gets $4
         1 1       A gets $2, B gets $2 (they share the loot)

We will represent this information in a so-called "payoff matrix" in Java as follows:
```
         public static final int[][] TWO_PLAYER_PAYOFF_MATRIX = {
	    {0,0,  0,0},
   	    {0,1,  4,0},
  	    {1,0,  0,4},
            {1,1,  2,2}
         };
     
```
Here, the actual payoff information is in the third and fourth columns. Just for completeness, we also provide the combinations in the first two columns. Thus, if you know the matrix, then you know the rules of the game.

To play, you will need to write a strategy as a Java class. For each game played, you will get the following as input:

Who you are playing against (the ID of your opponent). This is so that your strategy can depend on who you are playing against.
The payoff matrix. This will actually remain the same for all 2-person games so, strictly speaking, we don't have to provide this as a parameter in each game. However, for completeness, we'll provide this information.

Of course, you will also want to know what your opponent did in a game. Thus, after you've played, we will provide you with the action taken by the other opponent. We will also provide you with your payoff and your total current score.

We have just described in detail the method interfaces you need to implement:

Your player must implement the Prisoner interface.
This interface has three methods, the most important of which is the playGame method in which you implement your strategy. Since there are only two moves, you must return either 0 (betray) or 1 (silent).
Since you will be provided with the ID of the other player (among the many in class), you can let your strategy depend on past actions by other players. Thus, if you have learned that Player #5 is very evil, you can plan accordingly. On the other hand if you are playing the nice Player #3, you can risk being silent.
Now, to test your code, you will really need to implement two strategies so that you can play them off against each other. Let's use the following naming convention. If your username is Rahul, for example, you will name your two strategies RahulPrisoner1 and RahulPrisoner2.
Note that, to get started, your implementations can be very simple. For example, the "Always Betray" rule is simply one line of code, and will take you only a couple of minutes.

You will also need the game simulator. Download this jar file, which has the simulator and related classes. When you unpack, you can run PrisonerGame, which will bring up a frame. To run the simulator:

In the "Game-type" menu, set the game type to 2-player (We'll describe the 3-player version later).
In the "File" menu, load your players.
Then, click "Reset".
You can then play games by repeatedly clicking "Play one round". Alternatively, you can click "Go" and watch the scores updated as games are played automatically.
You increase/decrease the number of rounds played by entering an integer in the textfield and clicking on "Change rounds".

At this point, you have enough to start implement your 2-player strategies. If you like, you may examine the code in PrisonerGame.java to see how the rounds are conducted. Essentially one tournament round consists of every player playing every other player exactly once. Then, this is repeated for each round for as many rounds are there are.

Finally, let's describe the 3-person game:

A 3-person game is played as follows. Each game has three players called, say, A, B and C. All three players make their "move" (which is either 0 or 1). Then, each player receives a payoff.

Thus, there are 8 combinations of moves:

       0 0 0     All three betray.
       0 0 1     Only C is silent.
       0 1 0     Only B is silent.
       0 1 1     Only A betrays.
       1 0 0     Only A is silent.
       1 0 1     Only B betrays.
       1 1 0     Only C betrays.
       1 1 1     All three betray.

There are two variations of the 3-person game: the majority 3-person game and the minority 3-person game, and thus two different payoff matrices.

Let's consider the payoff matrix for the minority game first, assuming the "loot" is $6 million.

       0 0 0     A,B,C all get $0
       0 0 1     A and B escape and share ($3 each), C gets $0
       0 1 0     A and C get $3 each, B gets $0
       0 1 1     A escapes and gets $6, whereas B and C get $0
       1 0 0     A gets $0, B and C get $3 each
       1 0 1     B gets $6, A and C get $0
       1 1 0     C gets $6, A and B get $0
       1 1 1     All escape and get $2 each.

Thus, in Java, we will represent the above payoff matrix as follows:

  public static final int[][] THREE_PLAYER_MINORITY = {
	{0,0,0, 0,0,0 },
	{0,0,1, 3,3,0 },
	{0,1,0, 3,0,3 },
	{0,1,1, 6,0,0 },
	{1,0,0, 0,3,3 },
	{1,0,1, 0,6,0 },
	{1,1,0, 0,0,6 },
	{1,1,1, 2,2,2 }
    };

Notice that in the minority game, each player can win a lot by being in the minority. In other words, this variation corresponds to a higher standard of evidence required by the police. If we instead only require majority concurrence (go with the majority), we get a slightly different payoff matrix:
```
  public static final int[][] THREE_PLAYER_MAJORITY = {
	{0,0,0, 0,0,0 },
	{0,0,1, 3,3,0 },
	{0,1,0, 3,0,3 },
	{0,1,1, 0,3,3 },
	{1,0,0, 0,3,3 },
	{1,0,1, 3,0,3 },
	{1,1,0, 3,3,0 },
	{1,1,1, 2,2,2 }
    };
    
```

Thus, depending on the game-type, you will get one of three matrices when the simulator calls your playGame method.

What to submit:

Your source code for your two strategies, appropriately named. Note that each class must implement all the methods, and must contain code for both 2-person and 3-person games. Do NOT use separate files for 2-person and 3-person games.
Results (on paper) of playing your strategies against each other.

We will find some time in class to play the full competition. It should be exciting.