Xerratus
Happily stressed out, since 1974


 
Wednesday, December 21, 2005
<< Teaching Intelligent Design in Public Schools Ruled Unconstitutional
What is on FIRE? >>

Recently, with an aging application created by an employee who no longer works here, I ran into a problem with SSN’s being stored in the database as integers.  Now the person who designed this, from what I gathered from another employee that worked with him, was very adamant about doing it this way.  To store it as a string or varchar in the database was a no-no due to size and allocation issues.  Honestly, this isn’t a problem, storing it as an integer.  It does however become a problem when you don’t solve for zeros.  

Yes, zeros, they exist in social security numbers and they can even show up at the beginning of one.  Take mine for instance; the first 3 digits are 093 (I was born in N.Y.).  So the problem we had recently was just that.  The number went in minus the leading zero but when the number was taken out of the database, it wouldn’t validate correctly.  Duh, I wonder why?  At no time should you add to the number (i.e., prefix zeros if the number is smaller) because it takes away from the integrity of the number.  Perhaps somebody keyed the number in leaving out a digit, prefixing a zero will do more harm than good.

So what is the harm storing it in the database as a varchar?  I truly believe, with the size of hard-drives and RAM becoming ever larger, that size is not nearly the issue it was for data storage 5 years ago.  Personally, I believe in encrypting before storing it in a database but as a string not an integer.  Again, size should no longer be an issue.  Now a number that is 093-XX-XXXX will be encrypted as “093-XX-XXXX” instead of 93-XX-XXXX and stored as either a varchar or varbinary in the database (personally, I do varchar).  The number’s integrity is intact and decrypting it will keep the leading zero, as it should.

Is this really a problem worth mentioning?  Do developers really do this?  Apparently so.  For a while now, I would try to log in to one of my online banks (which will remain nameless) and on top of asking for a pin they ask rotating questions for extra security.  A nice touch if you ask me.  The problem is that every 3rd or 4th time I try and log in, it throws an error stating that the input does not match.  You want to guess what question it asks me each and every time that error gets thrown?  If you guessed “enter the first 3 digits of your social security number”, you are correct.  My speculation is that it is trying to compare my “093” with 93 from the database.  But in order for this to happen, they need to be converting the database value to a string, not the other way around.  You see, if they converted “093” to an int and compared, 93 would equal 93.  But, if you converted the database value to a string, “93” would not equal “093”.  That’s my guess at least.

In short, you should always encrypt SSN’s as strings in the database to be sure that the integrity of the number is never compromised for decrypting and comparing later.

UPDATE:
This comment, left by whatever43 from digg, hits the nail on the head. 

Most of the commentators, while perhaps excellent programmers, DBAs and the like, obviously don't understand the data about which they're commenting, which is most likely the source of all of the derision.
An SSN is not a number. It's a string made up of three distinct numbers, and only the third one (comprised of the last four digits) is serial. It can be treated like a number, but in most if not all cases it shouldn't be.
It's not a question of mathematical operations, storage space or anything else that's been mentioned here; it's a matter data integrity.
Thought not perfect by a long shot, SSNs are much more sophisticated than most people think. If you do any type of database programming and deal with SSNs, please go to www.ssa.gov and learn a little about them.
(No, I don't work for the Social Security Administration, but I have a lot of experience in data storage, validation and integrity, and I've encountered almost every question one can think of about SSNs.
)

Wednesday, December 21, 2005 1:18:35 PM (Pacific Standard Time, UTC-08:00)
AMEN!!
chicanito33
Thursday, December 22, 2005 10:20:17 AM (Pacific Standard Time, UTC-08:00)
I thought you'd enjoy that one chicanito33 :)
Tuesday, December 27, 2005 12:00:33 PM (Pacific Standard Time, UTC-08:00)
The SSN is a digit. That would be a good reason. Formatting for display is another issue. Your "validator" is not checking the validitiy of the SS Number. It's checking the formatting of the number.

Format the Number to whatever the "validator" is expecting. Store it as an int. (i.e. NumberFormat(digit,"000") )
ShiftLock
Tuesday, December 27, 2005 12:16:08 PM (Pacific Standard Time, UTC-08:00)
I understand but again, this is something I've seen all too often. There is good intention from one programmer but if the follow thru isn't there (ie, the validator) the integrity of the number is lost.

For example, I have a SSN stored as 1234567, 2 digits are missing. Since the UI took it in as xxx xx xxxx (3 seperate textboxes) I have no idea where to pad the zeros.

Again, if done correctly, I'm not against storing it as an integer... as long as it's done correctly the first time.
Wednesday, December 28, 2005 2:47:59 AM (Pacific Standard Time, UTC-08:00)
What purpose is served not naming the bank?
Person
Wednesday, December 28, 2005 7:06:35 AM (Pacific Standard Time, UTC-08:00)
Weird that the bank didn't ask for the last four digits like everyone else. Asking for the first three isn't very secure at all. You can make a good guess without much trouble.

Then again, if it has trouble validating eight-digit SSNs, maybe that's more secure. What's more secure than an unanswerable question? ;-)
spstanley
Wednesday, December 28, 2005 9:09:02 AM (Pacific Standard Time, UTC-08:00)
He's probably referring to ING Direct. I love their services, but they do the PIN and rotating challenge questions. I haven't had a problem, but I don't have a 0 at the beginning (in the middle though).
-md
Anon
Wednesday, December 28, 2005 11:01:30 AM (Pacific Standard Time, UTC-08:00)
Person: I'm not really sure why I wanted to leave the bank anonymous... not protecting anybody.

Anon: Yes, it is ING. One of their challenge questions is "what is the first 3 digits of your ssn" and that is the one that fails for me. Actually I should say failed... I just logged in this morning, got that question and logged in just fine. Wonder if they read this.
Wednesday, December 28, 2005 1:16:57 PM (Pacific Standard Time, UTC-08:00)
I totally understand what you mean! I face that problem when entering ISBN numbers (which usually start with a 0) in an excel spreadsheet
Wednesday, December 28, 2005 1:38:18 PM (Pacific Standard Time, UTC-08:00)
Yes, I had this problem in college back in '93 (born in CT, 044-xx-xxxx). My student ID card didn't work at the cafeteria. Shortly after (it was a new system) all the ID cards were reissued. Then I saw how the ID came up on everybody's - they prefixed the SSN with a 9 to create a 10 digit integer anyway! Still an integer tho.

As for ING, I haven't had this issue yet. I would assume it's not the backend DB but instead some Javascript on the client end.

For Excel, you can lead the number with a single tickmark to force it as a string...

- RR
Thursday, December 29, 2005 9:10:48 AM (Pacific Standard Time, UTC-08:00)
I don't think it matters much how you store the SSN, so long as you are able to use it effectively. Data integrity is not such a big issue. If data integrity was the ultimate goal, then there would never be a reason to store the SSN as, say, an MD5 or SHA-256 hash. And sure it's three groups of digits, but that doesn't mean that it's better or worse to store it as one integer, three integers, a string of characters which are all digits, or whatever. What determines the usefulness of the storage method is how you intend to use the data.
spstanley
Name
E-mail
Home page

Comment (Some html is allowed: a@href@title, b, blockquote@cite, em, i, strike, strong, sub, super, u) where the @ means "attribute." For example, you can use <a href="" title=""> or <blockquote cite="Scott">.  

Enter the code shown (prevents robots):

Live Comment Preview