|
Exact Match Problems
Matching across two separate data files containing full name and full postal
information with the intent of finding match records for the
purpose of appending data including email address is a very
complex problem. To address this problem correctly requires
a combination of high-powered software, high ethical standards
and respect and understanding of industry best practices.
When just dealing with the software portion of the equation
it should be understood that the software required to carry
out this endeavor is expensive, complex and multi-faceted,
being made up of several software components. These components
include software for Databases, file analysis, file manipulation
and file merge-purge, just to generally name a non-comprehensive
few. When talking with clients, most believe all that is required
to correctly compare and append information such as email
to their house file is some merge-purge software. Indeed some
companies that purport to do Email Appending only use the
elementary merge-purge capability associated with SQL or Oracle
software. Of course these operational approaches to Email
Appending do exist and hurt the Industry. This is like putting
a lawnmower engine on a steel beam and calling it a racecar.
Without investment in a body, wheels, transmission and steering
the engine will not provide any quality outcomes and will
not win any races. Furthermore if enough people enter lawnmower
engines on steel beams into car races it will impact the credibility
of the auto racing industry.
That being said, let's look at the basic engines for merge-purge.
Some folks use the elementary database engines described above.
Some folks try to build their own engines. Other companies
use “off-the-shelf” engines developed by software
companies like Group1, First Logic and others. Still other
companies use a hybrid of “off-the-shelf” and
internally developed approaches. AcquireWeb falls into this
later category. Most quality engines allow for a variety of
match stringency settings. These settings (exact, tight, medium
and loose) create the basis for the goodness of fit criteria
between two separate pieces of data to be called a match.
The highly ethical companies use a tight-match stringency
in their match algorithm. The majority of append companies
who do email appending use a medium stringency algorithm for
their projects to get more matches. What surprises most clients
is that the highly ethical companies don’t use an exact
match algorithm in their email append stringency. The fact
is that comparing two separate data files containing full
name and full postal data is extremely complex and rarely
does the exact name and exact address match.
A simple example is just to look at first name; Robert, _Robert,
robert, Bob, bobby, Bobby, _Bobby, _bobby, Bob, BOB, _BOB,
bob, _Bob, _robert, ROBERT, _ROBERT, rob, _rob, Rob, _Rob,
ROB, _ROB, Robby, _Robby, robby, _robby, ROBBY, _ROBBY are
28 examples of one name including logical extensions and including
one common data entry variable of starting the name at the
beginning of a data field or starting the name after first
space in a data field. There exist many different data entry
variables, as you can imagine.
It must be understood that most people with common first
names tend to use more than one extension of their name depending
on the circumstance. For example, they may use their full
formal name “Robert” when registering to vote
or to return their taxes. They may use another less formal
extension of their name “Bob” when signing up
to join a sports league. Hypothetically, if you are looking
for Robert Smith and comparing a voter registration file with
a sports league and using exact match stringency in your merge-purge
algorithm the result will return NO MATCH. This outcome occurs
even if all the other data elements including; last name,
address 1, address 2, city, state, zip, zip+4 and Phone are
exact matches including data entry variables. That is because
Exact means Exact. Thus comparing a 1 million record customer
file against a 95 million records opt-in email file using
“Exact Match Stringency” could return 1,000 +/-
1,000 matches. This is a sub-optimal outcome considering on
average 150,000 good and true matches may exist in the file.
Likewise, using medium or loose match stringency could return
a match for Rebecca Smith, Ronald Smith, Reginald Smith, Randy
Smith all of whom may not live in the same town or zip code
as the Robert Smith you are looking for.
There is no single match stringency that will return all
150,000 good and true matches from the example above and only
150,000 matches. The trick is to return as many of the 150,000
good matches as possible while limiting the number of inaccurate
matches. Acceptable limits for inaccurate matching should
be less than 0.5%. The most conservative approach is to carry
out matching using tight match logic. In the better systems
this allows for logical extensions in first names, and a variety
of data entry variables and slight data entry errors. Of course,
putting a finely tuned 600 horsepower engine on a steel beam
and entering it into the Indy-500 won’t win any races.
You still need a great body, wheels, transmission….
Likewise, having a great merge-purge engine will not result
in optimal outcomes for the client, unless there is great
supporting software in addition to high ethical standards
and understanding of industry best practices. AcquireWeb brings
all the components together in one package to deliver the
most, High-Quality matches in the industry. For a test drive
visit www.acquireweb.com
or call us at 650-212-2233.
Albert Gadbut
President
AcquireWeb, Inc.
2003
Isn't it time to decide which acquisition services will help you build and manage your
opt-in customer database?
|