Search Help & Tips

Fuzzy Search Methods

Unlike the other searches, which try to find exact matches for the search terms you enter, the fuzzy search looks for approximate matches. This allows you to search for names when you are uncertain of the correct spelling. This type of search works best with names, so that is how it is used here.

The fuzzy search uses variations on a method commonly called "soundex", which looks for names that sound approximately like the name you entered. Since the computer can't actually hear how the names sound, this method is not exact, and you may get results that seem odd (searching for "Smith", for example, will return the expected matches, but will also return "Sandie" and "Santo"). Each variation on the basic soundex method (there are several) will give slightly different results, so if one is not satisfactory, try another.

The fuzzy search is available as part of the Census, Directory, Tax and Global Name searches.

How the Fuzzy Search Methods Work

The Fuzzy Search currently implements two varieties of soundex-type search: the original Soundex method, and a newer method called Metaphone, which has two variations.

Soundex

From Wikipedia external, the free encyclopedia.

Soundex is a phonetic algorithm for indexing names by their sound when pronounced in English. The basic aim is for names with the same pronunciation to be encoded to the same string so that matching can occur despite minor differences in spelling. Soundex is the most widely known of all phonetic algorithms and is often used (incorrectly) as a synonym for "phonetic algorithm".

Soundex was developed by Robert Russell and Margaret Odell and patented in 1918 and 1922. A variation called American Soundex was used in the 1930s for a retrospective analysis of the US censuses from 1890 through 1920. The Soundex code came to prominence in the 1960s when it was the subject of several articles in the Communications and Journal of the Association for Computing Machinery (CACM and JACM), and especially when described in Donald Knuth's magnum opus, The Art of Computer Programming.

The Soundex code for a name consists of a letter followed by three numbers: the letter is the first letter of the name, and the numbers encode the remaining consonants. Similar sounding consonants share the same number so, for example, the labial B, F, P and V are all encoded as 1. Vowels can affect the coding, but are never coded directly unless they appear at the start of the name.

The exact algorithm is as follows:

  1. Retain the first letter of the string
  2. Remove all occurrences of the following letters, unless it is the first letter: a, e, h, i, o, u, w, y
  3. Assign numbers to the remaining letters (after the first) as follows:
    • b, f, p, v = 1
    • c, g, j, k, q, s, x, z = 2
    • d, t = 3
    • l = 4
    • m, n = 5
    • r = 6
  4. If two or more letters with the same number were adjacent in the original name (before step 1), or adjacent except for any intervening h and w, then omit all but the first.
  5. Return the first four bytes padded with 0.

Using this algorithm, both "Robert" and "Rupert" return the same string "R163" while "Rubin" yields "R150".

Metaphone

Metaphone was developed in 1990 by Lawrence Philips as a response to deficiencies in the Soundex algorithm. In 2000 Philips modified his original algorithm with additional heuristics, and called the result "double metaphone". Double metaphone will give better results than the original metaphone in some cases, but they often both give identical results.

The original metaphone system described by Lawrence Philips in Computer Language Vol. 7 No. 12, December 1990, pp 39-43:

	The 16 consonant sounds:
	
	B X S K J T F H L M N P R 0 W Y
	
	0 represents the "th" sound.
	
	Exceptions:
	
	Initial  kn-, gn-, pn, ac- or wr-     -> drop first letter
	Initial  x-                           -> change to "s"
	Initial  wh-                          -> change to "w"
	
	Transformations:
	
	Vowels are kept only when they are the first letter.
	
	B -> B   unless at the end of a word after "m" as in "dumb"
	C -> X    (sh) if -cia- or -ch-
	     S   if -ci-, -ce- or -cy-
	     K   otherwise, including -sch-
	D -> J   if in -dge-, -dgy- or -dgi-
	     T   otherwise
	F -> F
	G ->     silent if in -gh- and not at end or before a vowel
	         in -gn- or -gned- (also see dge etc. above)
	     J   if before i or e or y if not double gg
	     K   otherwise
	H ->     silent if after vowel and no vowel follows
	     H   otherwise
	J -> J
	K ->     silent if after "c"
	     K   otherwise
	L -> L
	M -> M
	N -> N
	P -> F   if before "h"
	     P   otherwise
	Q -> K
	R -> R
	S -> X   (sh) if before "h" or in -sio- or -sia-
	     S   otherwise
	T -> X   (sh) if -tia- or -tio-
	     0   (th) if before "h"
	         silent if in -tch-
	     T   otherwise
	V -> F
	W ->     silent if not followed by a vowel
	     W   if followed by a vowel
	X -> KS
	Y ->     silent if not followed by a vowel
	     Y   if followed by a vowel
	Z -> S

Further Information

There are several web sites that describe fuzzy search techniques. Some are very technical.

Soundex article from Wikipedia external
A general introduction to the soundex method.
Understanding Classic SoundEx Algorithms external
An introduction to a slightly improved version of the soundex algorithm, with implementations in various programming languages. The site also has a form where you can experiment with the soundex methods.
Implement Phonetic ("Sounds-like") Name Searches with Double Metaphone external
A very technical description of the double metaphone algorithm. This is the basis of the double metaphone method used on the viHistory site.