Barbara S.

asked • 04/02/16

Survey for 1500 people. Probability of more than one person having the same unique ID code

I am sending out a survey to 1500 people. They will each have a unique identifier. The unique identifier is the first two letters of your mothers maiden name, the day you were born and the first two letters of the city you were born in. For example: If your mother's maiden name is Smith, you were born on the 6th day of the month, in Chicago, then your code would be SM06CH. My math question is: What is the probability of more than one person having the same unique Id code?

Mark M.

How many maiden names begin with SM?
How many city names begin with CH?
Report

04/02/16

Barbara S.

Unless you have the statistics on beginning letters of names, assume a uniform distribution.
Report

04/02/16

Mark M.

Then P(X = xk) = 1/k
Report

04/02/16

Barbara S.

Thank you Mark. That is the definition of a uniform distribution.  My main question is a bit more complex.  It's about the probability of duplicates in a proposed "UID" scheme applied to 1500 people.   (We were OK with assuming a uniform distribution of letters to keep it simpler.  Distribution of first letters is easy to find on the web. The UID scheme uses first two letters, which decreases the odds by some amount.)  -- Barbara
Report

04/03/16

1 Expert Answer

By:

Marilyn L. answered • 04/03/16

Tutor
New to Wyzant

Knowledgeable and Patient Online Tutor in Math, CBEST, Writing, etc.

Barbara S.

Marylyn,
 
Thanks for the effort. I have forgotten or never learned a lot of math, but I think this is not quite right.  For starters, although there are indeed 10 digits and 100 combinations of 2 digits, there are only 31 possible days in a month.
 
I think that the form of the answer is something like the more famous puzzle which asks for a class of 30 or 31 students, what is the probability that two of them have the same birthday.  There is a youtube video deriving the solution of that and the often surprising answer is something around 70%. The calculation in that case involves factorials -- for computing the probability that no two students have the same birthday (about 29%) and then computing 1-p.    So I think that the form of the solution in this case involves computing the odds that no two survey participants having the same UID and then computing 1-p.  What do you think?
Report

04/03/16

Still looking for help? Get the right answer, fast.

Ask a question for free

Get a free answer to a quick problem.
Most questions answered within 4 hours.

OR

Find an Online Tutor Now

Choose an expert and meet online. No packages or subscriptions, pay only for the time you need.