Robert H. answered 03/04/23
Software Professional specializing in Math, Computer Science
If I understand this question, we're looking at two files, each containing a sequence of 4096 16 bit numbers. (Each number is composed of two consecutive bytes from a string of 4097 bytes. We'll number the bytes 0..4096). The first 16 bit number would be formed from bytes 0 and 1, the second from bytes 1 and 2, and the last from bytes 4095 and 4096. We'll be comparing corresponding integers (byte pairs) from the two files.
The problem asks for the chance that there is at least one match among these 4096 pairs of 16 bit numbers In these types of problems, it is sometimes easier to find the probability of the opposite, i.e., that no matches are found among the 4096 pairs. The chance of finding one or more matches is then just 1 - the probability that all are unique.
Since each 16 bit number can have 2^16 (= 65536) values, the chance of any two numbers being different is 65535 / 65536 = 0.99998474.
The chance of two pairs being unique is simply that number squared, and in general the chance for successfully finding a sequence of n pairs all unique is (65535 / 65536)n.
So given the two lists of 16 bit numbers, the chance of finding no matches would be
0.999984744096 = 0.9394126 .
This means the chance of one or more matches is approximately 0.06 or 6%.