Recently, with an aging application created by an employee who no longer works here, I ran into a problem with SSN’s being stored in the database as integers. Now the person who designed this, from what I gathered from another employee that worked with him, was very adamant about doing it this way. To store it as a string or varchar in the database was a no-no due to size and allocation issues. Honestly, this isn’t a problem, storing it as an integer. It does however become a problem when you don’t solve for zeros.
Yes, zeros, they exist in social security numbers and they can even show up at the beginning of one. Take mine for instance; the first 3 digits are 093 (I was born in N.Y.). So the problem we had recently was just that. The number went in minus the leading zero but when the number was taken out of the database, it wouldn’t validate correctly. Duh, I wonder why? At no time should you add to the number (i.e., prefix zeros if the number is smaller) because it takes away from the integrity of the number. Perhaps somebody keyed the number in leaving out a digit, prefixing a zero will do more harm than good.
So what is the harm storing it in the database as a varchar? I truly believe, with the size of hard-drives and RAM becoming ever larger, that size is not nearly the issue it was for data storage 5 years ago. Personally, I believe in encrypting before storing it in a database but as a string not an integer. Again, size should no longer be an issue. Now a number that is 093-XX-XXXX will be encrypted as “093-XX-XXXX” instead of 93-XX-XXXX and stored as either a varchar or varbinary in the database (personally, I do varchar). The number’s integrity is intact and decrypting it will keep the leading zero, as it should.
Is this really a problem worth mentioning? Do developers really do this? Apparently so. For a while now, I would try to log in to one of my online banks (which will remain nameless) and on top of asking for a pin they ask rotating questions for extra security. A nice touch if you ask me. The problem is that every 3rd or 4th time I try and log in, it throws an error stating that the input does not match. You want to guess what question it asks me each and every time that error gets thrown? If you guessed “enter the first 3 digits of your social security number”, you are correct. My speculation is that it is trying to compare my “093” with 93 from the database. But in order for this to happen, they need to be converting the database value to a string, not the other way around. You see, if they converted “093” to an int and compared, 93 would equal 93. But, if you converted the database value to a string, “93” would not equal “093”. That’s my guess at least.
In short, you should always encrypt SSN’s as strings in the database to be sure that the integrity of the number is never compromised for decrypting and comparing later.
UPDATE:
This comment, left by whatever43 from digg, hits the nail on the head.
Most of the commentators, while
perhaps excellent programmers, DBAs and the like, obviously don't understand
the data about which they're commenting, which is most likely the source of all
of the derision.
An SSN is not a number. It's a string made up of three distinct numbers, and
only the third one (comprised of the last four digits) is serial. It can be
treated like a number, but in most if not all cases it shouldn't be.
It's not a question of mathematical operations, storage space or anything else
that's been mentioned here; it's a matter data integrity.
Thought not perfect by a long shot, SSNs are much more sophisticated than most
people think. If you do any type of database programming and deal with SSNs,
please go to www.ssa.gov and learn a little about them.
(No, I don't work for the Social Security Administration, but I have a lot of
experience in data storage, validation and integrity, and I've encountered
almost every question one can think of about SSNs.)