Zekka N. answered • 03/13/19

Java and Python Tutor, Also Knows Obscure Programming Languages

Hello! Unfortunately, GetHashCode can't do what you want.

To simplify the math, I'll explain why you can't get what you want using a distance of 10 instead of 0.001. (The exact argument still applies, it's just easier to type.)

Your condition is this: if Distance(x, y) < 10, then GetHashCode(x) == GetHashCode(y). Then, in this world, because Distance(85, 90) < 10, GetHashCode(85) must equal GetHashCode(90). Fine so far.

However, because Distance(90, 95) < 10 and Distance(95, 100) < 10, all of those have to have the same GetHashCode() as 85, too. (because 95 must have the same GetHashCode() as 90, and 100 must have the same GetHashCode() as 95...)

By now you probably realize the problem. By the same argument, every number in existence has to have the same HashCode!

There's a data structure designed to solve this problem called a disjoint set. In its initial state, the disjoint set of 10 values represents 10 separate sets, each containing a single one of the values. However, it provides an option that allows you to take any two values and merge their sets. Here's a comprehensive example:

This neatly handles the problem of merging "neighborhoods" of related points that all satisfy Distance(x, y) < 0.0001. An article on this data structure is here: https://en.wikipedia.org/wiki/Disjoint-set_data_structure .

As for identifying if any two points are too close to each other in the first place -- you have a lot of data structure options. A particularly good option is to sort the original points by each axis -- then, for each point, look at all neighbors in either directions along that axis until Abs(neighbor.X - current.X) > 0.0001. Then, if Distance(neighbor, current) < 0.0001, merge their sets.

This has worst-case O(n^{2}) performance (when all the points are aligned on a single axis), but on average it's going to hit a small constant number of points, so it has O(n log n) performance from doing the sorts.