Dylan K. answered 11/19/22
Seasoned Software Developer - Data Driven - Python - Comp Science
For this solution, the first thing you are going to want to do is write the simulation so we can populate both dictionaries with wins or losses respectfully.
Let's define a function that will simulate a game of craps. First let's get a handle on the rules.
- If the sum is a 7 or 11, you win and the game is over.
- If the sum is a 2, 3, or 12, you lose and the game is over.
- If you roll a 4, 5, 6, 8, 9, or 10, that value becomes your "point" and you continue to roll until you re-roll your point or a 7. If you roll your point, you win; if you roll a 7, you lose.
So to write this simulation, the first thing we will want to do is simulate a dice roll. That is, make a function that gets two random numbers between 1-6 and returns the sum of them. We will call this function dice_roll().
With that, we can write a simulation of a singular craps game. We want two things out of this function. First, we want the number of rolls made, and second we want to know if we won or lost. In python, you can simply return multiple values. See below:
Following the ruleset, we will initially roll the dice (incrementing num_rolls and running our dice_roll function), and check in with the first two bullet points to see if we win, loss, or keep playing.
It's simple if we won or lost at this point, we just return True|False, and a 1 for num_rolls. If not, we will want to save the last initial value as a "point" and keep rolling. After each roll, we can compare the value to see if it matches the point or matches a 7, to see if we won, lost, or keep rolling. This is a good place to use a loop. Make sure to run this code at this point to sanity check your logic and make sure it's returning like you expect it too. Maybe add some print statements in the function to view it's state as it runs through the simulation. You could also use a debugger to inspect the state.
Now that we have the simulation, the question becomes how to best output this summary.
Overall, we know we will have some sort of loop that will run 1,000,000 times. Inside that loop we know we will make a call to our simulation function and get the result. We also know we will have to have two dictionaries defined before the loop that will be updated contextually. The "num_rolls" will be the key to these dictionaries, and the value will be the sum of games we've seen so far that either won or lost. So the initial value of an entry with no key will be 1, and subsequently we will increment that value by 1 each time we see an additional key that exists in our dictionary.
We are left with two dictionaries that represent the data we will need to summarize.
The first two points, percentages of total games won or lost, will at this point require us to iterate through all keys of both dictionaries, adding up the total sum of values inside that dictionary, to get us total wins or losses. Since we know we ran this 1 million times, we can do the simple math to get these percentages back.
An important thing to note here, is that we can do this more efficiently since we know what we will need in the summary. Perhaps instead of counting both dictionaries at the end, we can keep track of how many total wins or losses as we iterate through the 1 million simulations.
Getting the total games played that are won or lost for a particular number of rolls is easy, we can look that up in both dictionaries and add the two values together.
But when it gets interesting is the 4th point. Needing the cumulative percentage of total games played that were won or lost up to a given number of rolls. It's sort of like the third point. However, just like how we can more efficiently get the percentage of total games won or lost by keeping track of the totals during the simulation loop, we can do a similar thing here.
In the end, these optimizations aren't going to save us a lot of time. Whether we save the relevant summary data while we run the simulation, or go back afterwords and iterate through the dictionaries 1 or 2 times, we are still looking at the same Big O complexity of O(N). That is, we will still have a linear time complexity.
Note this time complexity and the number of passes you do over the data. If, for example when doing the cumulative percentage, you iterate through each key and check every number of rolls less than that key every iteration, you are potentially making O(n log n) many passes on the data. On larger data sets, that will quickly become inefficient versus a linear time solution of O(N). We really want to restrict how many times we are passing through the data.