Asked • 04/19/19

Finding strings that differ with at most one letter from a given string in SAS with PROC SQL?

First some context. I am using proc sql in SAS, and need to fetch all the entries in a data set (with a couple of million entries) that have variable "Name" equal to (let's say) "Massachusetts". Of course, since the data was once manually entered by humans, close to all conceivable spelling errors occur ("Amssachusetts", "Kassachusetts" etc.). I have found that few entries get more than two characters wrong, so the code Name like "__ssachusetts" OR Name like "_a_sachusetts" OR ... OR Name like "Massachuset__" would select the entries I am looking for. However, I am hoping that there must be a more convenient way to write Name that differs by at most 2 characters from "Massachusetts"; Is there? Or is there some other strategy for fetching these entries? I tried searching both stackoverflow and the web but was unsuccesful. I am also a relative beginner with both SQL and SAS. Some additional information: The database is not in English (and the actual string is not "Massachusetts") so using SOUNDEX is not really feasible (if it ever were). Thanks in advance. (Edit: Improved the title)

1 Expert Answer

By:

Rick A. answered • 04/21/19

Tutor
5 (33)

Experienced SAS Professional

Still looking for help? Get the right answer, fast.

Ask a question for free

Get a free answer to a quick problem.
Most questions answered within 4 hours.

OR

Find an Online Tutor Now

Choose an expert and meet online. No packages or subscriptions, pay only for the time you need.