Generate identical Hashcodes for approximately-similar numbers?

Question

I'm creating an application in C# 3.5 that uses the AutoCAD API to read a 2D AutoCAD drawing, make changes to the drawing using defined business logic, then adjust it back in AutoCAD. Due to the nature of the logic, the shape of the drawing has to be re-constructed - e.g. a rectangle is made up of 4 connecting straight lines.

I'm creating these shapes using the start and end co-ordinates of each line from AutoCAD, but some of the co-ordinates don't exactly match up. For example, one point could at 0.69912839 (on one axis), but a line starting from the same point could be 0.69990821. These are in mm so the distance is minute (0.00078mm!)

I've created my own class (call it MyPoint, similar to PointF) because I've needed to add some additional logic to it. In that class I've created a method that takes two doubles and returns true or false depending on if the two points are within 0.001mm of each other. I've then overridden the Equals method, == and != operators so I can do (point1 == point2 or point1.Equals(point2)) which checks if all axis are within 0.001mm of each other - if they are, I class it as being the same point.

That's fine and working brilliantly. Now, I need to check a collection of these point classes to get rid of all duplicates, so I'm using LINQ's Distinct() method on my collection. However this method uses GetHashcode(), not Equals() to determine if the instances are equal. So, I've overriden GetHashcode() which uses the GetHashcode of the double class.

But, the above example fails because obviously they're different values and therefore generate different hashcodes. Is there any way that two numbers that are within 0.001 of each other can generate the same hashcode? (Note the numbers don't know about each other as GetHashcode is called separately on different class instances.) I've tried numerous ways which work for some examples but not for others.

One example is truncating the number to 3dp (multiply it by 10^3, then truncate it) and creating the hashcode on the result - which works for the above example (699 == 699.) But this doesn't work for 0.69990821 and 0.70000120 (699 != 700.) I've tried rounding, which works for the second set of numbers (0.700 == 0.700) but not for the first (0.699 != 0.700.) I've even tried truncating the number to 3dp then adjusting it up to the next even number, which works for both the previous examples, but not for 12.9809 and 12.9818 (12980 != 12982.)

Is there another way, or should I scrap the Equals, ==, != and GetHashcode overrides, and create my own MyPoint.IsEqualTo() and MyPointCollection.Distinct() methods?

Zekka N. · Accepted Answer

Hello! Unfortunately, GetHashCode can't do what you want.To simplify the math, I'll explain why you can't get what you want using a distance of 10 instead of 0.001. (The exact argument still applies, it's just easier to type.)Your condition is this: if Distance(x, y) < 10, then GetHashCode(x) == GetHashCode(y). Then, in this world, because Distance(85, 90) < 10, GetHashCode(85) must equal GetHashCode(90). Fine so far.However, because Distance(90, 95) < 10 and Distance(95, 100) < 10, all of those have to have the same GetHashCode() as 85, too. (because 95 must have the same GetHashCode() as 90, and 100 must have the same GetHashCode() as 95...)By now you probably realize the problem. By the same argument, every number in existence has to have the same HashCode!There's a data structure designed to solve this problem called a disjoint set. In its initial state, the disjoint set of 10 values represents 10 separate sets, each containing a single one of the values. However, it provides an option that allows you to take any two values and merge their sets. Here's a comprehensive example:
[[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]]

# merge(1, 3)
[[1, 3], [2], [4], [5], [6], [7], [8], [9], [10]]

# merge(2, 4); merge(4, 5)
[[1, 3], [2, 4, 5], [6], [7], [8], [9], [10]]

# merge(1, 4)
[[1, 2, 3, 4, 5], [6], [7], [8], [9], [10]]

This neatly handles the problem of merging "neighborhoods" of related points that all satisfy Distance(x, y) < 0.0001. An article on this data structure is here: https://en.wikipedia.org/wiki/Disjoint-set_data_structure .As for identifying if any two points are too close to each other in the first place -- you have a lot of data structure options. A particularly good option is to sort the original points by each axis -- then, for each point, look at all neighbors in either directions along that axis until Abs(neighbor.X - current.X) > 0.0001. Then, if Distance(neighbor, current) < 0.0001, merge their sets.This has worst-case O(n2) performance (when all the points are aligned on a single axis), but on average it's going to hit a small constant number of points, so it has O(n log n) performance from doing the sorts.

Generate identical Hashcodes for approximately-similar numbers?

1 Expert Answer

Still looking for help? Get the right answer, fast.

OR

RELATED TOPICS

RELATED QUESTIONS

what are all the common multiples of 12 and 15

need to know how to do this problem

what are methods used to measure ingredients and their units of measure

how do you multiply money

spimlify 4x-(2-3x)-5

RECOMMENDED TUTORS

IXL

Rosetta Stone

Education.com

TPT

Vocabulary.com

ABCya

SpanishDictionary.com

Inglés.com

Emmersion

Generate identical Hashcodes for approximately-similar numbers?

1 Expert Answer

Still looking for help? Get the right answer, fast.

OR

RELATED TOPICS

RELATED QUESTIONS

what are all the common multiples of 12 and 15

need to know how to do this problem

what are methods used to measure ingredients and their units of measure

how do you multiply money

spimlify 4x-(2-3x)-5

RECOMMENDED TUTORS

find an online tutor