What follows is my method (and mistakes) when attempting the Cyber Security Challenge cipher puzzle.
-- EDIT: The original cipher introduction has be removed but the cipher text is still there --
WARNING! HERE BE SPOILERS
This cipher text was the start of a fun couple of hours puzzling for me. Oddly enough 3 different people from 3 different groups of friends sent me links to this within about 10 mins. I've done a few of these types of challenges, so I had a good idea what was in store. I'm not putting the full answers to every stage, just enough to show how each section moved on. I've done most of the meat of the work using Linux comand line tools, but towards the end I used a spreadsheet for easy sorting and a text editor with a decent find/replace.
First thing I noticed was the end of the cipher
The equals sign at the end is a dead give away that this is a base64 encoded string. This is not so much a cipher as a standard way of encoding files to be sent over things like email, where some characters may cause problems. Base64 encoding a file results in it ending up just being "safe" characters.
I always fall back on a perl one liner for base64
$ perl -MMIME::Base64 -ne 'print decode_base64($_)' <input >output
So now we have a decoded file we need to workout what it is. Most structured files have some kind of identifying marks, a footprint if you will. Running the file through program "strings" is a good first step.
$ strings output | head
The string "JFIF" means this file if a Jpeg, Yay! a picture for us! (sidenote: I only had a hunch that this was a jpeg, as I know there's more than one string that would mean it was a jpeg, a quick google proved me right. The Exif string is also a give away that this is a picture or video file. I never got round to checking the Exif data, there may be other hints hidden in there, but they were uneeded to complete the puzzle)
Having identified it as a jpeg, lets open that puppy up and have a looksee.
Time to take a closer look.
Bring out the gimp! (or the photoshop if that's your bag)Hmmmn, a series of white and black squares... I wonder what that could be representative of? Binary perhaps? ;)
It's at this point in every one of these puzzles I do that I think "I really should learn to use some image libraries so I don't have to do this bit by hand", and the proceed to do it by hand anyway, writing a 0 for every white space and a 1 for every black, if that's the wrong way round it's trivial to swap over. I have, however, learnt to put a set of grid lines over the image just to keep track of where I am, usually 8 pixels wide, as 8bits just makes sense to me.
01000011 01111001 01110010 01101110 01100110
01110010 00100000 01110011 01100010 01111001
01111001 01100010 01101010 00100000 01100111
01110101 01110110 01100110 00100000 ...SNIP
Every 8bits starts with a 0, that's a good sign! Looks like we're dealing with the standard ASCII character set. Time to pull out your ascii table, convert those series of 1's and 0's to hex or decimal, compare to the table and convert to text!
Or do the easy thing and go to snarkles.net and choose binary to ASCII. (this site has saved me so much time during puzzling if I meet the writer he/she is getting free beer all night)
Now we've got a string of gibberish, but it looks like there's punctuation for a URL so we need to do something else to it. The most common puzzle cipher which leave the punctuation untouched is a caeser shift, or ROT-13, this involves moving a set number of places throught the alphabet for each character. Back to our online tool, caeser brute force and see if anything stands out. We can see that it was a ROT-13.
Well that converts to a sensible string and there's a URL so we must be on the right track.
Going to the URL gives us... another seemingly random string of alphanumeric characters.
No equals this time, and the chances of the same cipher being use twice is slim so we can rule out base64. Looking at the string we can see it's all hex, there's no letters above F.
This is the point when spidey senses tingle and previous puzzle experience makes me think these are pairs of characters. We're probably looking for text and you can't represent many letters with a single hex character. Time for a bit of command line fu (I've got the new cipher-text in a file called second).
$ sed 's/../&\n/g' second | wc -l
So, what I did here is insert a newline after every pair of characters and then count them. I'm fully aware that there's multiple ways of doing this, and probably many better ways. I like this particular cat skinning method, so there.
So 500 pairs of characters, that's a sizeable chunk of cipher-text to work with.
Now depending on how devious the Cyber Challenge folks have been, and how evil they were feeling when writing the puzzle, there's a couple of things they could have done (one time padding etc.). A good guestimation of difficulty of the next step is to find the number of different pairs in this ciphertext, the more numbers of pairs the hard it's going to be to crack.
So we repeat the previous step but sort them and count the unique pairs:
$ sed 's/../&\n/g' second | sort -u| wc -l
56! Jackpot! That gives us enough for upper/lower case letters and some change for other characters, so it looks like this is a substitusion cipher, where every letter is replaced by something else.
I repeat the previous command but pipe the output into two files "second.split" and "uniqpair" to play with. Looking at uniqpair there seems to be some sort of pattern, but not one that jumps out, so there may be more than a simple substitution cipher going on here
Lets look at the pairs in a bit more detail, how often do the pairs occur in the cipher text?
$ for i in $(cat uniqpair); do echo -ne $i","; cat second.split | grep $i | wc -l; done
So we can see that some pairs occur very regularly, others rarely (77 instances of the pair "04", but only 1 instance of "05"). Now, letters in most language are not evenly distributed throughout words, however they are fairly consistently distributed. Looking at this is called frequency analysis and is a very handy tool in cryptanalysis.
If this is a substitution cipher we can look at the cipher-text and have a bloody good guess at what letters the pairs are likely representing. Now we're only working with 500 characters, and quite a few characters that only have one instance, so we can't simply say $a is $X, but what we can do is start with the common letters and say $a is quite likely to be $X, $Y or $Z lets see what happens if we replace $a with $X. This is where your brain is so much better than a computer because your brain can asign context to anything that pops out.
I put the pair counts in a spreadsheet so I could easily sort by occurance or pair as and when my brain needed it.
Now this is where I made my big mistake. I'll admit it, I'm not proud. I'm ocationaly stupid, but not proud. Remember when we looked at the unique pairs in the cipher text and we found 56 pairs which "gives us enough for upper/lower case letters and some change for other characters"?
I totally forgot this, so when I started to look at replacing $a with $X, $Y or $Z I didn't think about this being a chunk of writing, and in chunks of writing the most common character isn't a letter, it's a space. So I drove myself insane for 90mins, going down the wrong path finding dead ends at every turn.
Once we've got the spaces we've got the word lengths, once we've got the word lengths finding the letters becomes much easier. I usually start by trying to find the T's as they're the second most common letter and very commonly found next to H's and then E's. This gives us a good framework to build on.
Once we've put all the letters that we can in, we're still left with a bunch of yet to be deciphered pairs. Annoyingly they're obviously the pairs we actually need.
Lets look at what we have so far. We have pairs of hex characters and we have some of them deciphered into plain-text. There must be some patern to find. Lets look at the relationship of upper to lowercase letters
so we can see there's some pattern between upper and lower case, which might help with some parts of the cipher-text, but it looks like we're looking for non alpha characters so there must be more.
If we match up the plain-text with our handy ASCII table from stage two of the puzzle we get
I still feel like we're on the right track, but how do we get 41 from 28?
Break it down some more.
Wait! what's that?
By jove I think we've got it! The plain-text is having it's last three bits shunted to the front.
All that remains is to apply this in reverse to the remaining encrypted characters to get the complete plain-text, follow the instructions and wait to hear if we're right!