This thought occurred to me when I was preparing for my end-semester exam of the Information Security Systems course. The course was all about computer networks, security, and cryptology. Before I come to the topic of this post, let me give a brief introduction to Monoalphabetic ciphers.
Monoalphabetic cipher is a way to encrypt data (convert data into a secret form) by substituting each alphabet of the message to be encrypted with some other alphabet such that the substitute alphabet chosen for each alphabet remains constant throughout the message.
Accourding to simonsingh.net:
The ciphers in this substitution section replace each letter with another letter according to the cipher alphabet. Ciphers in which the cipher alphabet remains unchanged throughout the message are called Monoalphabetic Substitution Ciphers.
Suppose we want to encrypt the following message:
Meet me today at twelve
If we choose to substitute the letter “e” with, say, “u”, the letter “m” with “a”, the letter “a” with “c”, the letter “t” with “n”, and so on…, we’ll get the encrypted text like this:
auun au nyxcz cn npukbu
When you pass this message to your friend to let him decrypt it (convert it back into the original message), it would be assumed that it’s only you and your friend who knows the letter mappings – as to which letter was substituted for which. It seems pretty efficient way of sharing secrets? But, nope, it isn’t that efficient as it can be easily broken.
The most common and simplest way to break a monoalphabetic cipher is by guessing each alphabet in the encrypted text by using the help of a table/graph containing the relative letter frequencies in English language. Consider the frequency chart as follows:
As you can see from the above figure that the most used letter in English language is “e”, followed by “t”, then “a”, then “o”, then “i”, and so on…
Now looking at out encrypted text “auun au nyxcz cn npukbu”, it can be noticed that:
the letter “u” occurs 5 times (most times)
the letter “n” occurs 4 times (second-most times)
the letter “c” occurs 2 times (third-most times)
So, the first guess would be:
“u” was substituted for “e”
“n” was substituted for “t”
“c” was substituted for “a”
Using this much analysis, we try to decrypt the text as:
_eet _e t__ay at t_e__e
So you see, it wouldn’t be hard from here on to guess the original message as “meet me today at twelve”.
But this task of breaking the cipher (cryptanalysis) could have been made more difficult for the “hacker” by using SMS English. Suppose we wanted to encrypt this message:
come for tea at club see you there
Now before encrypting this message, you first convert it into SMS English:
cm 4 t at klub c u dere
With this converted text, proceed with the normal monoalphabetic substitution, and make sure your message target (probably your friend) already has the letter mapping. The encrypted text would be very difficult to break as mere guessing of letters using letter frequency table would lead to revelation of utter gibberish. And there are no letter frequency charts (or at least none I could see) for SMS language.
So all you SMS addicts, worry no more. You can finally utilize your skills for some good cause.