Search:

Java Forum

Ask Question UnAnswered

Home » Forum » Java

RSS Feeds

RDB design, Avoiding Duplicate Records

Asked By: Arland Date: Mar 14 Category: Java Views: 1639

As you may know one of the best practices in relational database design is the use of numeric primary keys which causes to boost the performance in data manipulation and retrieval.

Unfortunately there are few ways to achieve this goal and of course one of them would be use of the embedded database “AUTO NUMBER” feature which is basically very profitable for smaller applications, but it’s absolutely not reliable because of allowance of duplicate records. For instance, inserting the same person twice into the DB will cause having 2 different records.

In this situation, what comes to mind at first is to make up a PK according to the every record specifications. For example, to insert a university student data into a table, the numeric PK would consist of year of admission, score, … which makes the student No.
But, unfortunately you can not always make up such a combination for every record.

So, the other option, as I think, is the use of HASH ALGORITHMS which produce unique numeric outputs based on an input String. For instance you give the algorithm a combination of some specific characteristics as a String input and then you receive a unique number as output.

Now, what I wanted to know is that can anybody propose a better solution?
Does anyone have such a Hash Algorithm?

7 Answers Found

Answer #1 Answered By: Ora Hanson Answered On: Mar 14

First, my source of confusion is why would you insert duplicate data
into your database. You may need to check your object in the UI, if
applicable, and then insert it into database.

Anyway, I wouldn't go for Hashing. The point is that you will have to
check your database quite a few times to find the right id. The world
would be a better place if we could define a hash function that maps
our objects to unique ids.

You could alternatively rely on your database auto numbering, but you
will need to put it in a stored procedure to check whether not the
record is already in database. Your database is indexed and a String
search is fast anyway. On top of that, putting this logic in the
database will give you scalability at a minor cost. You always have
the option to add another machine to the cluster if the performance
sucks.

Answer #2 Answered By: Angel Watkins Answered On: Mar 14

Many Thanks for your reply, but there is no source of confusion.
when you are using auto number, insertion of duplicate record happens alot.
besides, please consider that such preventions should not occur in UI at all, because of the severe performance degradation in searching DB for every inserting record. also, as benchmarks say, PK-String-search in a db will never be as fast as numeric search.
of course as i said, in smaller application you can choose this strategy and even using Hash Algorithms with String output (such as SHA-1and MD5) would be good. but in larger ones, only Numeric PK is recommended.

Answer #3 Answered By: Burkett Bernard Answered On: Mar 14

tell me about your application. It helps a lot if I know how you have duplicate records.

Answer #4 Answered By: Perdita Lopez Answered On: Mar 14

let me give you a complete example:
suppose a table of personnel information including the following fields.
id, firstname, lastname, father_name and age.
when you insert data:
Jack, Smith, adam, 27
it will insert with an integer PK (like 987)
now if you accidentally insert those data again, it will insert another record with another PK (like 988) which is not desireable
now what I want is that I want to control generating primary keys based on the given data. I need a hash algorithm which gives numeric outputs as follows.
JackSmithadam27----->1025458218
JackSmithadam23----->1025498548
JackSmithadam27----->1025458218------->PK exist

Answer #5 Answered By: Faith Hughes Answered On: Mar 14

this is where you define rule engines to avoid the load to be pushed on the db server or unnecessary code on UI.

your solution (hashing) might not be a better alternative to the first one (according to your email) and I don't think if you end up with duplicate PKs. but you might having different records with different PKs but representing the same object. this is where you can use sequence in oracle or identity() in sql server to identify the object

most of the time you can let DB server to do the work by defining PK on columns. i.e. in sql server when you define a pk on multiple columns therefore the server will enforce uniqueness by generating unique index for the pk columns. creation of the index involves some kind of hashing and it is a reliable and efficient technique. this is irrespective of the types of the pks (string or integer) but if columns contain null the ok generation function can ignore the content and still construct a unique key

still even for small scale applications, you might consider some light-weigth rule engine approach,

Answer #6 Answered By: Dinh Tran Answered On: Mar 14

this defining PK on different columns seems quite an interesting way, but here 2 questions arise:
1- are these generated PKs integer?
2- can I apply this feature in other databases? cause I use MySQL and I haven't yet encountered such a feature in it.

Answer #7 Answered By: Ann Evans Answered On: Mar 14

this defining PK on different columns seems quite an interesting way, but here 2 questions arise:
1- are these generated PKs integer?

Reply : This does not matter, but the only restriction is that the column that you are defining a PK for should not be NULL.

2- can I apply this feature in other databases? cause I use MySQL and I haven't yet encountered such a feature in it.
Reply : As far as I know this is definitely supported by Oracle and SQL Server, I'm not too sure about MySQL.

Didn't find what you were looking for? Find more on RDB design, Avoiding Duplicate Records Or get search suggestion and latest updates.

Your Answer

Please login to post answer

Tagged:rdb design avoiding duplicate records

Previous Post:

Web Application Load Testing

Next Post:

SMS web services required

Recording a macro in EXCEL that will record mouse clicks

record set access and record locking

avoiding from same frame

Duplicates in different colors from 2 tables

Finding duplicates from an excel range and coloring them automatically

Finding duplicates in a range

avoid duplication

Help with duplicates

Newbie VBA question about duplicates

Help on Removing Duplicates in Excel

Avoid alerts while sending an automated Email

delete duplicate PERSONAL workbook?

Need Help In Deleting Duplicate Items in Excel a excel file

Signing code to avoid Enable macros dialog

Highlight duplicate values of certain columns only

How to delete duplicate thru vba

Duplicates in a collection

Avoid multiple select constructs

Need help in finding duplicates

Avoid alerts while sending an automated Email

Duplicated partial path in Package Information

10 Common mistakes to Avoid When You're installing Linux software

Duplicating Live USB Distro

App design

Design of application

RSS Feeds:	Articles \| Forum \| New Users \| Activities \| Interview FAQ \| Poll \| Hotlinks
Social Networking:	Hall of Fame \| Facebook \| Twitter \| LinkedIn
Terms:	Terms of Use \| Privacy Policy \| Contact us

Java Forum

RDB design, Avoiding Duplicate Records

7 Answers Found

Your Answer

Related Post