Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Their are many good reasons to always have a primary key, even if it is just an automatic serial number, but the one that hit me personally is that it is surprisingly difficult to deduplicate a relational database.

When I was first learning SQL I was pretty firmly in the "use natural keys" department. And when the natural key was every single column I would go "whats the point?" shrug and have no primary key. Until I started getting duplicated rows

    insert into customer_email (name, address) values ('bob', 'bob@bobco.com');
    insert into customer_email (name, address) values ('bob', 'bob@bobco.com');
Duplicate rows a. tend to mess up your query results and b. are surprisingly difficult to remove. If I remember correctly after spending far too long trying to find a pure sql solution I ended up writing a program that would find the duplicates, delete them(all of them as there is no way to delete all but one) then re insert them. and adding that missing primary key.

I still like natural keys more than I probably should. (you still need a key to prevent functional duplicates, even when using a surrogate key, why not cut out the middle man?) But am no longer so militant about it(mainly because it makes having dependent tables a pain)



I'm a fan of always including unique indexes in the DB, even if it must exclude soft deleted rows. At a minimum it can keep functional duplicate out. Those seem especially insidious when there are races.


Using natural keys is what actually can prevent duplicate rows. In your above example, if email is the PK, there would be no duplicates. But adding an id as a PK would essentially keep your database with duplicates:

(1, 'bob', 'bob@bobco.com')

(2, 'bob', 'bob@bobco.com')




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: