How
NameSearch® Works - Intelligent Scoring
NameSearch® comparison algorithms determine numeric
values that represent the likelihood of a match. Two entities are passed
to the comparison routine and a score or array of scores are returned.
There are many ways of calculating a score. William Jefferson Clinton
versus Bill Clinton would yield as scores; 100, 50, 66 and 33. The basic
approach is to divide the number of tokens by the number of matches and
multiply that figure by 100. In the previous example the number of tokens
is either two or three words. Bill Clinton has two tokens and William
Jefferson Clinton has three. The only word that matches exactly is Clinton.
Dividing the number of tokens by the number of matches and multiplying
by 100 we get (1/2 * 100 = 50) and (1/3 * 100 = 33). Alternatively, Bill
and William are used interchangeably and can be considered to match.
In this instance the scores would be (2/2 *100 = 100) and (2/3 * 100
= 66).
The score or scores that are used to determine the likelihood of a match,
depend on the degree of accuracy required by your system. Scoring is
essential in on-line applications, where the number of records returned
is too great for a person to scan or batch utilities, where decisions
based on the likelihood of a match invoke automated processes.
ALFACOMP Comparison Algorithm
This is used to compare fields containing multi word strings. The
ALFACOMP routine is based solely on a heuristic algorithm and is not
dependent on rulebase expertise.
DATESCR Comparison Algorithm
The date compare routine is used to intelligently
determine the similarity of two dates. The routine uses the strength
of NameSearch® to arrive at its results. The date comparison routine
interrogates two dates by parsing them into their components. The month
and day represent one component, which accounts for half the score;
the year field represents the other half.
The first part of the scoring routine is the parsing of information. The majority
of this work is accomplished through the use of NameSearch®’s sanitization
and word recognition routines. These routines use the date service, supplied
with the NameSearch® software, to arrive at its results.
After the dates have been parsed and standardized, the date comparison back-end
calculates a score between 0 and 100.
The DATESCR
comparison routine uses rulebase expertise in order to arrive
at its’ results.
For example, July 28, 1965 compared to 7/28/66 would yield a
score of 100 in this matter the DATESCR routine overcomes problems due
to
inconsistency in date format. The routine also accepts several
parameters which will dictate the penalty for mis-matches based on the
year. By
increasing these settings the score can be made more tolerant.
For example, if you want all dates that correspond to July 28, 1965 + or
- two years, you would set the year range to 2. If you wish it
to be
+ or - five years this would mean your year range would be set
to 5. In this manner NameSearch® gives you the ability to widen or narrow
the range of dates being returned given the month a day agrees.
NUMCOMP Comparison Algorithm
The number comparison routine compares two alpha-numeric fields.
The
routine was originally designed for numeric character comparisons
but works well on alpha-numeric character strings. For example this
routine is well suited for Social Security number comparisons.
The
numcmp function compares two strings, character by character. It returns
four scores. Each score represents a different
way of calculating
the number of characters, that correctly compared against the number
of characters available for comparison.
COMP, COMP1, COMP2 Comparison Algorithm
These are NameSearch®’s comparison routines used for scoring names
and addresses. These routines utilize NameSearch®’s rulebase expertise
and phonetic tokenization to determine scores. Comp was the original
comparison routine released with version 1 of the NameSearch® product.
COMP1 was introduced in Version 2.0 of the NameSearch product in
order to provide a more representative score. In Version 2.5 of
the NameSearch product COMP2 was added. This routine uses ALFACOMP,
an advanced
heuristic
algorithm, to arrive at its results.
MultiComp Comparison Algorithm
MultiComp routine allows evaluating multiple fields and delivering
a combined, weighted score. For example, users may want to compare
personal names, corporate
names and addresses, and receive a single score. This score is created
based on a field definition string that contains the methodology used
for deriving and weighing scores.
MultiComp provides the capability
of comparing fields with different comparison routines, setting up
scoring scheme masks, customizing threshold
scores, as well as weighting attributes and score types.
How NameSearch® works
|