Jonathan's latest books. > > >
Data Design Pros, LLC
~ Custom Software for Growing Businesses ~

Are you using FileMaker Pro to help run your business? You should be. All other databases are second-class citizens and money pits. (Yikes! That sounds a little harsh.) Read why here.

We make your databases work the way you want them to. That's as easy as it gets. Contact us here.

We are dedicated to serving businesses by designing, constructing, training and servicing tools built with the cross-platform database program FileMaker Pro.

* Save Time
* Increase Profits
* Get Exactly the Software You Need

Click here for information about what can be done with FileMaker

About Us
Learn about our company and how we started.
Products & Services
Find out about our product line.
In the News
Read press releases, newsletters, articles, and reviews of our products. Example below.
Ask the Experts
Get answers to commonly asked questions or send email to our experts.
Sign our guest book to receive product notices and press releases.
FileMaker Advisor Magazine
Jonathan's column
FileMaker Pro
Find out more about the program we use in our application development.
Job Opportunities
Data Design Pros, LLC

What our clients say about our work.
FileMaker Links
Go here for a list of links to other web sites that talk about or use FileMaker Pro etc.
FileMaker 8.5 Book Updates
Additions and corrections.
FileMaker Training

Eliminating Duplicate Records

Getting rid of the excess fat from your database

By Jonathan Stars

Something that has been an important part of my work for many of my clients is removing duplicate records from their database. They don't want their sales team making multiple calls to the same person or mailing more than once to each person on their list.

I want to tell you about a process I developed for a client over the past couple months to give you some idea of the scope this can include. If you just want to know how to "de-dup" your files, skip ahead to the section titled "The Main Course."

In this recent big project, my client gets lists of possible contacts from many sources at least once a week. He imports them into his FileMaker Pro (FMP) database in a multiple step process. I have it compare in multiple fields as described below and combine duplicate records, assuming that most of the contact information in the newest record is more recent and therefore more accurate. It retires the older duplicates so they can be retrieved if need be. Then the process shows him a list of any records that are considered near duplicates so he can manually check the few that remain. After those are retired, he assigns them to his employees for followup phone calls.

For a little different point of view, employees at one association I work with were adding new member or prospect records without checking to see if the caller was already in the file. We instituted a method of controlled data entry. It's important to get your people to ask callers how to spell their names. It could be "Thom" instead of "Tom" or "Jon" instead of "John." After initial data entry, we would present the user with a list of possible name matches. The next screen would check for possible company matches.

I use multiple calculation fields to compare records in order to root out possible duplicates that may have crept into the file. First I look for duplicate full names. But if you have a lot of records in your database (with many John Smiths for instance), you may also want to add in the zip code. If you use zip plus four in the zip code field, you may only want to use the left five characters in the zip field. Take a look at this image.

Keep reading for instructions for how to perform a find for duplicates.

For a second comparison I use a calculation field that combines the first word in the last name field and zip code. The reason I only use the first word is that a person may have an appendix like Jr. or II or PhD, but sometimes you'll get the same person in a different record without those little extras. Full names aren't good enough by themselves because of nicknames and the previously mentioned alternate spellings of first names.

Then I use a calculation field that combines company name and last name. But that won't catch everything because you'll run into abbreviations and misspelling of company names. So you'll periodically want to sort your database by company name and provide the exact same company name in all records.

If at all possible, use some unique identifier like Social Security Number or an ID assigned by a national organization. But even then don't trust it completely. We found multiple contacts that had been given the same "unique" number for one reason or another. Even though the data comes from a computer, somebody had to enter that data, and they could have had a bad night's sleep or lazy fingers or a long liquid lunch.

Look again at the image showing four example calculations that may be useful for de-duping. After you create the fields, put them on a layout and perform a Find for duplicates. Simply start a Find, put the exclamation mark "!" in one of the fields and finish the find. Now sort the records by the field you're using and look at them in a list view. You'll get a pretty good idea of how much work you have ahead of you.

Although you can find duplicates using this method, choosing the "keeper" and combining the data from the one or more that you'll discard may be a little more difficult. Which data do you keep - the newer one, the older one, or parts of each? And if you have related records such as invoices or notes in a portal, how will you make sure those records end up with the surviving record? You will probably need some professional help in order to accomplish that. If it turns out that's what you need, get hold of me.

J **


© 2009 Jonathan Stars


Data Design Pros, LLC is an independent entity and this web site has not been
authorized, sponsored, or otherwise approved by FileMaker, Inc.

Email us here.

web counter

web counter