The first round takes your normal text and renders it as cypher text using a "key". An example of this might be to take the phrase
"I have solved the San Jose Semaphore"
and encrypt it by moving all the letters forward (+) 8 values on the ASC II text table using the digital reference as a guide. The resulting text would be now encrypted as:
"Q(pi~m(|wt~ml(}pm([i%20(Rw|m([muixpwzm"
You could now employ a second "round" whereby each cypher text character is substituted for another using some lookup table or algorithm. In fact, in many second rounds, the key for the first round is actually encrypted as part of the cypher. An example could be to take the string above and transpose the case from lower case to upper case or substitute characters on a querty keyboard two spaces to the left of the keys needed to re-create the text.
The advantage of using two rounds is an exponential gain in complexity for those trying to solve the equation. Adobe uses AES in most of its' Livecycle products which uses 4 separate rounds of encryption resulting in so many combinations that it is physically impossible to crack the code using a brute force method. In fact, assuming you could build a machine that could crack one DES key every second by brute force, it would take a billion trillion years to crack AES at the 128 bit strength. That is why I scoffed in my earlier blog entry on some people who were under a mistaken impression they could circumvent Adobe's PDF encryption techniques using Gmail.
The linear substituion demonstrated above is of course is a very poor encryption algorithm. For starters, it is linear so any patterns that are used in the original text are present in the cyphertext. For example, all spaces in the original text are displayed as "(" characters. This leads to easy recognition of patterns for things like double characters in words (example:" two "t"'s in "pattern").
A much better approach is called "non-linear substitution" whereby each character is substituted with its’ cipher text character using a dynamic map rather than a static linear map (such as Shift+8 characters on the ASCII chart). There are many ways to build this that avoid patterns being recognized from the original text and even introduce confusing patterns to those who want to crack the cipher. In the Semaphore broadcast, the tone of the woman's voice could be a key that acts as a "shift" and the tone could act as an offset on a chromatic scale. A great example of this might be to create a 16 by 16 grid and have multiple allowances of the assessors for each cipher substitute. The chart could look like this:
Now imagine you wanted to substitute the same phrase "I have solved the San Jose Semaphore". You could take it into the first cypher round and transmit it as a tied hash (ignoring space characters for now):
Alpha 9
Delta 13
Lima 7
Lima 2
Delta 9
Oscar 15
Kilo 15
Foxtrot 10
Juliet 8
Kilo 5
...etc...
Notice that patterns that may have existed before can be dissolved. For example, the "e" in have and the "e" in solved can be keyed to two completed different coordinates (both Kilo 5 and Oscar 15 correspond with "e"). I have additionally introduced what appears to be a pattern by using two Lima coordinates side by side.
The text above is very similar to what comes out of Semaphore if you listen to the live broadcast. I mapped the above grid to the broadcast from Semaphore earlier today and found there are now some astounding patterns present. By "astounding", I really mean statistically abnormal. Here is what I decrypted:
Lhsvezh vbdktppbtk vbdktppbtk lbpelsxhsllbpelsxhilhbl tnfddnnhbl tnfddln
Note that the repetition of the characters "vbdktppbtk" is statistically abnormal and probably deserves more attention. I am not 100% sure this map is correct as I was on a conference call during the time and may have misheard a character or two but this is substantially accurate.
So what should the table contain? If Semaphore is in fact a 16 by 16 grid for one round of encryption, it probably doesn't just contain A-Z repeated. Extended ASCII itself has 255 characters which is one shy of 16 times 16 (256). Could that be the answer? If someone cares to take the ASCII table and overlay it on the grid and map out some of the broadcast, I would love to see the results although I suspect the results are going to be in cyhper still. One thing that *could* be gained by such an exercise is to find out the finite length of the broadcast. If you think I should do it, please leave a comment on this blog.
So what are the second rounds? I picked up my guitar last night and played the notes as they are broadcast and found a lot of them are middle "C". Perhaps the offset on the chromatic scale from C is a second round key? What part does the airplane detection play? Why are the patterns broadcast at exactly 7.2 seconds apart? Why am I writing on this?
More to ponder.... Back to the drawing board.
I think I'm getting hooked on this project.