Asher B. answered 06/22/22
Masters in Math with 12+ Years Teaching (& love Finite Math!)
I'm gonna start with what I see as a more-intuitive breakdown of the situation, and then at the end I'll try to help build that up to the formula you're probably working to be able to use & understand more generally.
So ok, we know we pulled a red marble! What are the different ways that can happen?
- We could have flipped heads first - the probability of that is 0.3 - and in that situation the probability of getting the red marble is 7/10 (since there's 7 red + 3 blue = 10 marbles total). The chance both these things happen is 0.3*7/10 = 0.21
- Or, we instead could have flipped tails first - with a probability of 0.7 - in which case the probability of getting the red marble is 3/8 (since this time there's 3 red + 5 blue = 8 marbles in the urn). The chance both of these things happen is 0.7*3/8 = 0.2625
The total probability of pulling a red marble in any of these different ways can be found by adding up the probabilities of all the different ways: 0.21+0.2625 = 0.4725, and I like to think of this as the proportion of the time that you'll go through this whole thing and get a red marble.
So now, if we want to know not what proportion of the time first thing we computed will happen - that you flipped heads and then pulled a red marble - but instead the proportion, out of all the times we pulled a red marble, that we will have flipped heads... we want to look at that 0.21 not out of all 100% of things that can happen, but just out of the proportion of times when we know we chose a red marble: 0.21/0.4725 = 0.44444... = 4/9.
It involves a little more computation than a simpler probability problem, but see how it's still just the same setup of,
- count how many ways the thing we want can happen
- count how many things total can happen
- divide the number of ways the thing we want can happen by the total number of things that can happen
Really it's just kind of a multi-step version of that, where we had to use this same principle to find the probability of pulling a red marble from each of those urns on our way toward getting the things we actually cared about comparing.
----------------------------------------
So ok, our answer is 4/9 but we haven't mentioned the elephant in the room: Bayes' Theorem, and the reason I'm not saying it until now is because folks think of it as this big scary thing... when it's just what we did above. It just looks scary when you see all the notation. Let's see if we can dig into that now, though, looking at what we did above.
You were given that the probability of flipping heads was Pr(H) = 0.3
Assuming you got heads, we computed the probability of pulling a red marble out of the first urn was Pr(R|H) = 7/10. Here's our first bit of scarier-looking notation, but we just pronounce it as "the probability of R given H" - the vertical line is read as 'given' in this context - so it's just the symbols to communicate exactly what we said earlier in this paragraph: the probability that you get a Red marble, assuming you had flipped Heads.
We can set up a similar statement for the other option: Pr(R|T) means "the probability we got a Red marble, assuming we flipped Tails", and is pronounced "the probability of R given T". We computer Pr(R|T) = 3/8.
We knew how to find the probability of two things along a chain like this happened: the probability of getting Heads and then a Red marble, for example, came from multiplying the probability of getting Heads by the probability of choosing a Red marble from the urn you're presented if you get heads. Symbolically, we're saying Pr(H∧R) = Pr(H)*Pr(R|H) - the probability of getting heads and a red marble is the probability of getting heads, multiplied by the probability we already computed of getting a red marble assuming we got that heads. We found Pr(H∧R) = 0.3*7/10 since those are the probabilities of the two events we're seeing we need to multiply.
Likewise we can maybe now understand something like Pr(T∧R) = Pr(T)*Pr(R|T) as the second of the two bulletpoint computations at the start of this whole post.
Our final bit of work, then, was summing up all those ways to end up with a Red marble: Pr(R) = Pr(H∧R) + Pr(T∧R) = Pr(H)*Pr(R|H) + Pr(T)*Pr(R|T)
See how that long formula at the end of the chain of equalities is just made up of the little pieces we've notated along the way? And it all is just a formal way of listing those various ways we can end up having pulled a Red marble.
Then our final step was really just computing Pr(H|R) = Pr(H∧R) / Pr(R)
Why do we have H∧R in the numerator here, and not just H? Well, when we're computing the probability that we got heads assuming we got a red marble, that limits the "universe" of things we're willing to consider having happened. We're not looking at all things that could have happened after getting heads, we know we got a red marble and so need to only consider the times we can get heads and a red marble, not all possible ways of getting heads. Both the numerator and denominator account for this: it's (probability of getting Heads and then choosing a Red marble)/(probability of choosing a Red marble) - we're working in a consistent universe where all possible outcomes include getting the Red marble we're given as a condition of the probability we want to compute. So long as the H∧R part feels ok, the rest of it hopefully seems pretty straightforward still.
Which is actually really cool because we're at the point where we can whip out Bayes' Formula:
Pr(H∧R) = Pr(H)*Pr(R|H), and Pr(R) is that long sum we found above of Pr(H)*Pr(R|H) + Pr(T)*Pr(R|T)
(it should probably feel further affirming that, writing it all out longform like this, the bit of the formula that's in the numerator is one of the little bits adding up to make the denominator, since that's exactly what we wanted: the proportion of cases within the universe of getting a Red marble where we also end up getting Heads!)
Well then Pr(H|R) = Pr(H∧R) / Pr(R) = Pr(H)*Pr(R|H) / [Pr(H)*Pr(R|H) + Pr(T)*Pr(R|T)]
... and that's all Bayes' Theorem is. Just a longform way of saying "add up all the ways the limiting condition can happen, and then take the proportion of those times the thing we want happens".
The only additional complexity that might show up is there could be situations where there's more than two ways for the thing we want to happen. Maybe instead of flipping a coin, we roll a die, and there's 6 different urns depending on which number we roll. This isn't conceptually any harder, though, just more of the same: our denominator will just have 6 different little scenarios added up instead of 2.
I hope that helps give some context so that you're more confident approaching further problems like this! Thanks for your question, I'm awfully fond of Bayes' Theorem :)