|Math OER Zoom Room Jamboard Lectures Textbook|
The salient and undeniable truth about cryptography is that no measure of violence or proscriptive legislation will ever solve a math problem.
- Jacob Riggs
Codes are how computers keep secrets.
Codes are also how people keep secrets from computers.
What does it really mean when a website is "secure" and your browser shows a tiny lock symbol by the website name?
Why do modern hackers spend most of their time trying to trick people into revealing passwords, instead of using code-breaking tricks? Why are governments able to sometimes break each other's codes, but not do so regularly?
Cryptography can be fun. Did you ever play with secret codes as a kid?
Let's explore the fundamentals of cryptography. The math starts out easy but at the end gets tricky. Most of this playground will be interesting and fun. If the very end is too hard, it is okay to skip it or get help. We all sometimes play a video game and get stuck on the final boss fight.
"Begin at the beginning," the King said, very gravely, "and go on till you come to the end: then stop."
- Lewis Carroll, Alice in Wonderland
Where do we start when talking about codes?
We start with is a sentence that will not make sense right away, but will soon: Modern codes are based on clocks so division will be broken.
To understand that strange but true sentence, we should start at the beginning. We start with math on a 12 hour clock.
Math on a clock "wraps around". After 12 o'clock we go back to the beginning. If we count five hours past 11 o'clock we do not say 16 o'clock but instead say 4 o'clock.
In a sense, 16 o'clock and 4 o'clock are two names for the same thing. But we do not want to say they are equal or write 16 = 4. That looks wrong and hurts my brain.
The official word we use instead of equal is congruent. Instead of the equals sign we use a slightly different ≡ symbol. The equation we write is called a congruence, and looks like this:
16 ≡ 4 (on a 12 hour clock)
Similarly, if a seven-hour plane flight from New York to London starts at 6 o'clock, we say it arrives at 1 o'clock instead of 13 o'clock.
13 ≡ 1 (on a 12 hour clock)
We can add the same amount to both sides of the congruence. This is arithmetic that represents moving the clock hand clockwise. Let's move time ahead two hours.
13 + 2 ≡ 1 + 2 (on a 12 hour clock)
15 ≡ 3 (on a 12 hour clock)
We can subtract the same amount from both sides of the congruence. This is arithmetic that represents moving the clock hand counterclockwise. Let's undo the earlier shift by moving time back two hours.
15 − 2 ≡ 3 − 2 (on a 12 hour clock)
13 ≡ 1 (on a 12 hour clock)
Notice that clock arithmetic does have a zero. We just normally call it 12 o'clock.
12 ≡ 0 (on a 12 hour clock)
We can multiply the same amount to both sides of the congruence. What does multiplying mean? "3 o'clock × 4" really means "move 3 hours clockwise, four times, starting from zero". Multiplying is repetitive adding, just the way we normally think about it. It represents starting at zero and moving the clock hand forward repetitively.
15 × 2 ≡ 3 × 2 (on a 12 hour clock)
30 ≡ 6 (on a 12 hour clock)
So far so good.
Division does not work reliably on a clock. Let's try it.
15 ÷ 3 ≡ 3 ÷ 3 (on a 12 hour clock) ← does this work?
5 ≡ 1 (on a 12 hour clock) ← does this work?
This is not true! 5 o'clock is not congruent to 1 o'clock.
Soon we will use arithmetic to make a code. We want a processess that is easy to do but hard to undo so our code is not easily broken. With normal arithmetic that is impossible: if we do something with addition we can undo it with subtraction, or if we do something with multiplication we can undo it with division. But when using clock math division does not work. With clock math, doing something with multiplication can be hard to undo.
If this was a video game, we are still in the tutorial but we have picked up our first weapon.
It's then we'll hear St. Peter / tell us loudly with a yell, / "Take a front seat you soldier men, / you've done your hitch in Hell."
- Frank Bernard Camp, "Our Hitch in Hell"
What if our clock used military time?
Now the clock has 24 hours instead of 12.
It is no longer true that 16 and 4 are congruent because in military time 16:00 and 04:00 are two different things.
If a seven-hour plane flight from New York to London starts at 18:00 it goes past midnight to 01:00.
25 ≡ 1 (on a 24 hour clock)
Unfortunately, looking at military time was only a tangent. We will build off this idea. If this was a video game, we are still in the tutorial and were taught a melee attack we will never use again.
Your turn to flex your cryptographic muscles! Check for yourself that addition, subtraction, and multiplication still work on a 24 hour clock. What value is equivalent to zero on a 24 hour clock?
How long do I want these messages to remain secret? I want them to remain secret for as long as men are capable of evil.
- Neal Stephenson, Cryptonomicon
To start making codes we introduce the traditional way to make numbers and letters correspond.
This way to move back and forth from letters to numbers is called a cipher.
In theory, we could use a cipher that was not so obvious with all the numbers in order. But we us this most natural cipher and let the math we do to the cipher make our codes secure.
In this cipher we write first the numbers as 01 through 09. Eventually it will be convenient for every number in the cipher to be two digits.
Notice that the letter Z could be represented by either 0 or 26, just like on a twelve hour clock 12 o'clock behaves like zero.
0 ≡ 26 (on our 26-letter cipher)
In a moment we will plug these numbers into formulas. What happens if the answer is bigger than 26? We wrap around, just like clock math clock.
The directions about how to wrap around must be included in each congruence. Those phrases are getting a bit long. We do not want to write "on our 26-letter cipher" again and again. So mathematicians use the jargon mod to describe when to wrap around. It does not really matter whether the real-life application is a clock or a cipher.
16 ≡ 4 (mod 12) ← on a 12 hour clock both represent 4 o'clock
25 ≡ 1 (mod 24) ← on a 24 hour clock both represent 01:00
31 ≡ 5 (mod 26) ← on a 26 letter cipher both represent E
The word "mod" is an abbreviation for "modulo". The more official name for clock math is modulo arithmetic.
Now we are making progress! We want to make codes out of words, not times. If this was a video game, we are almost done with the tutorial and have just picked up the ammunition for our weapon.
The die is cast.
- Julius Caesar
The most natural thing to do to a clock is shift forward or back a few hours. So our first type of code will merely involve shifting between letters back and forth in exactly the same way.
This works great early in the alphabet. If we shift eight spots to the right then A becomes I, B becomes J, and so on. But what do we do with V? If we try to shift V eight spots to the right, we could fall off the end of the alphabet! So we wrap around, of course. We are used to that by now.
Historians know that Julius Caesar used this type of code. Let's use his famous sentence "The die is cast" in our example. I like that sentance because I'm lazy, and all his other famous sentences are much longer.
The die is cast
We use our cipher to change our message.
20 08 05 • 04 09 05 • 09 19 • 03 01 19 20
Then we disguise each number by shifting it to the right eight spots and wrapping around as needed. Just to establish good habits, we will take a moment write this carefully as an encryption formula.
disguise ≡ number + 8 (mod 26)
Here are the shifted numbers:
02 16 13 • 12 17 13 • 17 01 • 11 09 01 02
For most codes, we stop here and send those disguised numbers as the code. But to follow historical tradition, when doing the Caesar Shift we use the cipher a second time to translate back into letters and send those as our code.
Bpm ltm qa kiab
Ta da! That was easy.
Obviously, the recipient should use a decryption formula that shifts each number by −8 and wrapping around as needed.
number ≡ disguise − 8 (mod 26)
Perhaps you played with this type of code as a child. You probably saw that it is easy to break the code.
Some letters appear much more frequently than average. Overall, E appears as 13% of the letters in a sentence. T is 9%. I, N, and R are 8%. A and O are 7%. Furthermore, certain words are much more common. The top fifteen appearing words are: the, be, to, of, and, a, in, that, have, I, it, for, not, on, with.
Because the encryption formula is applied to one letter at a time, those hints about letter and word frequency make breaking the code much easier. Even if our encryption formula was trickier than "add 8" this style of code is still easy to break.
If this was a video game, we have finished the tutorial. The game begins. We have no power-ups yet and are really wimpy.
Your turn to flex your cryptographic muscles! Use the Caesar Shift page at cryptii.com to make a Caesar Shift coded message. Give it to someone else to solve. If you are nice, also share the facts about the most common letters and words. (Hint: Don't be nice.)
Spending time with math people is a lot of fun. As a result of the play, I've had semi-drunken dinners with mathematicians all over the country. I recommend the experience.
- David Auburn, playwright of Proof
To make a better code, we smoosh together the numbers made with our cipher into blocks. We will be wimpy and use blocks made with only two letters. Even those short blocks make a much more secure code.
The die is cast
20 08 • 05 04 • 09 05 • 09 019 • 03 01 • 19 20
02 16 • 13 12 • 17 13 • 17 01 • 11 09 • 01 02
Bp ml tm qa ki ab
This makes slightly more work for the recipient. He or she now receives a message without the spaces between words. The decoded message is now "Th ed ie is ca st." Fortunately, it is usually easy to see where the spaces should be to make real words and a sensible sentence.
(In theory we could make our cipher bigger so it includes punctuation. But we're too lazy.)
But we're not done yet.
As long as we are smooshing together pairs of numbers, why don't we treat each pair as a single four digit number?
The die is cast
2008 0504 0905 0919 0301 1920
That allows us to use a more complicated encryption formula.
disguise ≡ number × 6 (mod 26)
The resulting code becomes:
12048 3024 5430 5514 1806 11520
Now we're getting somewhere! The blocks prevent a code breaker from using hints about letter and word frequency. This is finally a code that the average third grader cannot break.
The recipient will need to use the opposite formula to decrypt the message.
number ≡ disguise ÷ 6 (mod 26)
As before, the decoded message will be "Th ed ie is ca st" and the recipient will need to fix the spaces to make a sensible sentence.
The previous code required the recipient to know the "key" for the code, which was "divide by six". But how does the recipient get the key in the first place?
Our codes provide no secrecy if the sender's first message must be uncoded and include the key used in later coded messages. An eavesdropper could intercept the first message and steal the key!
Our codes are pitifully inefficienct if the sender and recipient must meet face-to-face to exchange the key.
If this was a video game, we have found the first quest.
Your turn to flex your cryptographic muscles! Visit the Practical Cryptography website to read about a classical encoding plan other than the Caesar Shift.
A court order is as ineffective at accessing encrypted data as a nuclear weapon. Only the keyholder can access that data, and that power resides exclusively with them.
- Jacob Riggs
Our quest again involves modulo arithmetic.
It will help to watch this video about end-to-end encryption that makes the issue clear.
Then this video introduces how secretly sharing a key involves mixing three things.
Those videos avoid the nitty gritty. But we are up for the challenge.
Imagine that Alice and Bob want to create a key without sharing the key with an eavesdropper named Eve.
Alice and Bob begin by sharing publicly that they will use a congruence with an exponent.
y ≡ 11Secret (mod 26)
Alice picks 5 for her secret exponent. She plugs it and and sees that 7 ≡ 115 (mod 26). So she sends to Bob the number 7.
Bob picks 14 for his secret exponent. He plugs it and and sees that 17 ≡ 1114 (mod 26). So he sends to Alice the number 17.
So far Eve has seen which style of congruence they are using and the numbers 7 and 17. But division does not work reliably for modulo arithmetic. So Eve cannot work backwords to find the secret exponents.
Next, Alice changes her congruence to have Bob's Number as the base of the exponent, with her own secret number as the top of the exponent. She sees that 23 ≡ 175 (mod 26).
Similarly, Bob changes his congruence to have Alice's Number as the base of the exponent, with his own secret number as the top of the exponent. He sees that 23 ≡ 714 (mod 26).
Alice and Bob now both have the same number of 23 to use as a key. Eve cannot combine her information to get 23.
A modern key exchange would use a more complicated formula and bigger numbers. But the essenence remains the same.
Being able to create a secret key is amazing. It also does not take very long with a computer. In a spreadsheet you can type =mod(7, 14) to get these kinds of answers instantly.
When your smartphone sends a text message, it does this type of thing with every single text message. The messages are more secure when the software is constantly creating new keys.
If this was a video game, we have finished the first quest. It's reward was a power-up that made us grown stronger. We are ready for the boss fight! (Yeah, it's a really short video game.)
Your turn to flex your cryptographic muscles! Use a spreadsheet or the modulo arithmetic calculator at Planet Calc to make the modulo arithmetic instant. Try having Alice and Bob use different secret numbers. Does the process still conclude with a shared key?
On two occasions, I have been asked [by members of Parliament], "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" I am not able to rightly apprehend the kind of confusion of ideas that could provoke such a question.
- Charles Babbage
Our ability to secretly create a shared key is amazing. But our encryption and decryption formulas were too wimpy. Merely dividing by a number is not a strong enough decryption formula.
The history of better encryption formulas is tied to the history of computers. The hunt for the best encryption formula is like Goldilocks in the house of the three bears. We want it to be "just right" instead of too simple or too complicated. If the formula is too simple then a code breaker can break the code. If the formula is too complicated then it takes too long to send and receive a code.
The most famous mathematicians studying codes during World War II were Elizebeth and William Friedman. They knew that the "supercomputers" of their time would eventually be able to break any code. But they discovered encryption formulas that would make Goldilocks happy: a message could be sent in seconds, but breaking the code would take years.
Since the 1970s, the common encryption formula has the same appearance as the key exchange formula: a congruence with an exponent. The sender plugs one block at a time into the bottom of the exponent. Making the top of the exponent bigger as computers get faster keeps Goldilocks happy
In our example of a key exchange we used a congruence that wrapped around at 26. Our new encryption formula needs to avoid dissecting our biggest possible block, which is YY or 2525. So we will work with a number bigger than that. Let's use 2,633 to keep the 26-ish vibe.
encryption ≡ blockkey (mod 2,633)
Perl is the only programming language that looks the same before and after encryption.
- Keith Bostic
Here we go!
As before, we use our cipher to make pairs of letters into four-digit blocks.
The die is cast
2008 • 0504 • 0905 • 0919 • 0301 • 1920
For this example let's start with a small number as the top of the exponent. Let's use 3.
encryption ≡ block3
When we cube each of our four-digit blocks we get big numbers!
8,096,384,512 • 128,024,064 • 741,217,625 • 776,151,559 • 27,270,901 • 7,077,888,000
Next we need to wrap around on an imaginary clock with 2,633 hours. This is trivial with computer help.
• In a spreadsheet you can type =mod(8096384512, 2633) to instantly get the first block's answer of 1,667.
• The modulo arithmetic calculator at Planet Calc can also find 8,096,384,512 modulo 2,633.
The code we send ends up being these six blocks:
1667 • 2338 • 1795 • 1085 • 0920 • 2215
So far so good.
Your turn to flex your cryptographic muscles! Encode a short sentence. Use the cipher to create four digit blocks, and use a spreadsheet or the modulo arithmetic calculator at Planet Calc to make the encoding formula's modulo arithmetic instant.
Unbelievable! You, Subject Name Here, must be the pride of Subject Hometown Here.
How does the recipient decode the message? The corresponding decryption formula is:
disguise ≡ number1,755
(You do not know where the 1,755 came from. Trust me that it is advanced math, but quick math.)
Our code's recipient must plug in the six blocks one at a time. The first is:
block ≡1,6671,755 (mod 2,633)
Uh oh! Boss fight!
Can we even figure out 1,6671,755? That is a disgustingly big number. Our calculators cannot handle it. Our spreadsheets cannot handle it.
The fact that we are doing modulo arithmetic saves the day. Not only does modulo arithmetic break division (so Eve cannot eavesdrop on the key exchange) but it gives us a back door into math problems with disgustingly big numbers.
If this was a video game, we ready our weapon and ammunition.
As we approach 1,6671,755 it towers above us. We cannot see it clearly. Fortunately, we do not need to know this number!
This ginormous number is 1,667 × 1,667 × 1,667 × 1,667 × ... with the multiplication happening for 1,755 repetitions.
If this was a video game, in the cave with the boss fight is a skeleton of an adventurer who tried to fight the boss and failed. What did he do wrong?
He started the long chain of multiplying. He did 1,667 × 1,667 = 2,778,889 which wraps around to become 1,074 (mod 2,633). And then he could multiply by the third 1,667 and again use the (mod 2,633) property to shrink the answer down to a reasonable number. And so on. But before he finished all 1,754 steps he died of boredom.
Perhaps you have seen the product rule for exponents: 1,667a+b = 1,667a × 1,667b
= 1,6671 + 2 + 8 + 16 + 64 + 128 + 512 + 1024
= 1,6671 × 1,6672 × 1,6678 × 1,66716 × 1,66764 × 1,667128 × 1,667512 × 1,6671,024
We can find the congruances for these eight smaller exponents without too much spreadsheet work because of a pattern.
1,6671 = 1,667 and because this is less than 2,633 it does not wrap around.
1,6672 = 2,778,889 ≡ 1,074 (mod 2,633).
1,6674 = 1,6672 × 1,6672, but from what we just did...
This is equivalent to 1,074 × 1,074 = 1,153,476
Then we find 1,153,476 ≡ 222 (mod 2,633)
1,6678 = 1,6674 × 1,6674, but from what we just did...
This is equivalent to 222 × 222 = 49,284
Then we find 49,284 ≡ 1,890 (mod 2,633)
1,66716 = 1,6678 × 1,6678, but from what we just did...
This is equivalent to 1,890 × 1,890 = 3,572,100
Then we find 3,572,100 ≡ 1,752 (mod 2,633)
...and so on because letting the exponents double keeps the pattern going.
To get to 1,6671,024 we follow that pattern eleven times.
If this was a video game, our ammunition has been used eleven times. Our weapon is warm.
We still end up with a really big product of eight numbers:
1,6671,755 = 1,667 × 1,074 × 1,890 × 1,752 × 351 × 2,083 × 136 × 65
But modulo arithmetic again makes this manageable. With a product of only eight parts (instead of 1,755) we can behave like the skeletonized adventurer, and approach this long bunch of multiplication one product at a time without dying of boredom.
We start with the first two numbers in our product.
We multiply 1,667 × 1,074 = 1,790,358.
Then we find 1,790,358 ≡ 2,551 (mod 2,633).
To keep going we need to also multiply by 1,890.
We multiply 2,551 × 1,890 = 4,821,390.
Then we find 4,821,390 ≡ 367 (mod 2,633).
To keep going we need to also multiply by 1,752.
We multiply 367 × 1,752 = 642,9840.
Then we find 642,9840 ≡ 532 (mod 2,633).
...and so on.
Eventually we get to the last step where we include the × 65. The result of the previous step was 598. So 598 × 65 = 38,870 which is congruent to 2008 which was indeed our first four-digit block representing the letters TH.
If this was a video game, our ammunition has been used seven more times. Our weapon is hot. But one block is decoded! The battle contiues until all six blocks are decoded. The recipient reads "Th ed ie is ca st" and the boss if defeated!
Whew! That is a fair amount of math for just one block, even using a spreadsheet to do the modulo arithmetic instantly. But we need enough complication to make the code unbreakable.
In our example above, the encryption and decryption formulas had exponents whose top numbers were 3 and 1,755. The second was the "key" to our code that the recipient needed to know to decode the message. It was a pretty big number: in the thousands!
Back in the days of Elizebeth and William Friedman, during World War II, a "key" in the thousands would have been sufficient. The biggest computers of that era, like the Colossus, could not break that kind code before the relevant battle was be over and the information was obsolete.
(Remember, the exponent style of encryption and decryption formulas were not actually used until the 1970s. We are just making a comment about supercomputer speed.)
Today much bigger exponent keys are needed for security.
Making the exponent have an additional three digits (from thousands to millions) about doubles the time needed to use the code, and the time needed to break the code. We can keep repeating this. Changing the key from a number in the millions to a number in the billions again doubles the times. And Changing the key from a number in the billions to a number in the trillions again doubles the times.
Since the 1980s supercomputers have become so fast that they can break the code with an 80-digit exponent "key" in less than a day. (Almost a day for the CDC CYBER 205, a few hours for a CRAY X-MP, as documented here.)
That 1980s CRAY X-MP has a speed of 0.8 gigaflops. The current top supercomputers have speeds in the millions of gigaflops. I have no idea what that really means, but it is fun to say.
Modern encryption and decryption formulas use exponent keys with lots of digits. They generate new keys with every text message, and every time a secure website sends or receives information. But the math is still based on the same procedures.
Your turn to relax and watch your friends sweat. Give them your coded message and a link to this webpage. Can they decode your message?