Tuesday, November 19, 2002
      ( 9:59 AM ) Matt  

Unique Identifiers

One of the critical components of the password/login/authentication scheme which I plan to implement for Okonominet is a unique identifier. Each user is identified by both her user name and unique identifier. The unique identifier is computed as a one way hash of both the user's public key and her user name. That way it is very difficult to impersonate a user by using the user name because the unique ID can always be recomputed from the users public key and user name. This requires that people are able to remember other's unique IDs. This could be assisted by the client. Each time a user name is seen by the client which does not have the same unique ID as one seen before, the client could alert the user. But then, what happens if you log in from a different machine, and don't have your history of users? You'd have to remember the unique ID.

Unique IDs need to be pretty big before they are effective. It would be nice to have less than a 1 in a million chance that a million users using the system wouldn't have the same unique ID. For the purposes of Okonominet, maybe it would suffice that no two users (out of a million users) would have both the same unique ID and user name. This might be closer to having a 1 in a million chance that a thousand users (i.e. all with the same user name) would have the same unique ID. So, we need about 1,000,000,000 (one billion) unique IDs.

One quick and easy way to get a billion unique IDs is to just use the numbers 0 to 999,999,999. Unfortunately, remembering 9 digit numbers is not most people's strong point. These numbers represented in hexadecimal are about 8 nibbles (hex digits) long. 0xDEADBEEF is an easy to remember hexadecimal number, but how about 0x1DCD6500? Not quite so easy. Well, it turns out that 0x1DCD6500 is exactly 500,000,000. That is easy to remember...

People can typically remember words a lot better than numbers. Could we use words as unique IDs? Well, according to wordorigins, Webster's Dictionary has 450,000 words. That's more than 3 orders of magnitude short of what we need. However, if we use random pairs of common words, there would be enough variations to use as a unique ID, and we would have something relativly memorable.

That short changes any users which don't speak English. It also requires that the client be distributed with a large dictionary of words. This is looking ugly.

The alternative I'd like to pursue is to generate nonsense words by stringing letters together. They couldn't be totally random sequences of letters because that would be just as bad as using numbers. Instead we need to abide by the rules which lex our language. I am certain there is a way of algorithmically generating English sounding words. With an un-lexer for other languages, we could also generate nonsense words in other languages. Perhaps this could be accomplished by using a Markov chain. Furthermore, it would be important to be able to translate anyone's unique ID into any of the other forms. An English UID needs to be translatable into a Japanese, hex, or decimal UID.

A similar, simpler alternative is to chose a set of about 100 unique syllables and chain them together randomly. A unique ID would then be a 4 or 5 syllable nonsense word. Japanese is a nice language for this because there are about 100 syllables in Japanese. This each correspond to a character in hiragana. (Or two characters for a few of the syllables.)

# -

Comments: Post a Comment

Dreams I have...

Powered by Blogger
Feel free to e-mail me.

free hit counter