Cryptanalyzing Polyalphabetic Substitution using Correlation and Coincidence ============================================================================ Solution by Michael Lee First I found the incidence of coincidence (kappa) for various key widths. The upper bound is arbitrary, but I chose to calculate values for widths from 1 to 36. ================= Width Kappa ------ --------- 1 0.037931 2 0.0413793 3 0.05 4 0.0706897 5 0.0413793 6 0.0293103 7 0.0448276 8 0.0689655 9 0.0327586 10 0.0275862 11 0.0413793 12 0.0362069 13 0.0396552 14 0.0448276 15 0.012069 16 0.0603448 17 0.0258621 18 0.0344828 19 0.0413793 20 0.0465517 21 0.0448276 22 0.0310345 23 0.0396552 24 0.0517241 25 0.037931 26 0.037931 27 0.0413793 28 0.0482759 29 0.0362069 30 0.0310345 31 0.0189655 32 0.0672414 33 0.0362069 34 0.0431034 35 0.0344828 36 0.0482759 ================= The values at 8,16,24, and 32 are local maxima, so the size of the key (and the period of the running key) is most likely 8. Now I calculate the letter frequencies for each of the 8 subfiles: =============================================================================== Letter frequencies for subfile #1 of 8 0.0273973 0.0547945 0 0 0.0684932 0.0136986 0.0273973 0.0958904 0.0273973 0 0.0821918 0.0684932 0.0958904 0.0273973 0 0.0136986 0 0.0410959 0 0.0958904 0 0.0273973 0.0273973 0.136986 0.0410959 0.0273973 Letter frequencies for subfile #2 of 8 0.0684932 0.0273973 0.0410959 0.0273973 0.0273973 0 0.0136986 0.0273973 0.0821918 0 0.0136986 0.0136986 0.0136986 0.178082 0.0684932 0.0136986 0 0.0684932 0.123288 0.123288 0.0273973 0 0 0.0136986 0.0273973 0 Letter frequencies for subfile #3 of 8 0 0.109589 0.0136986 0.0136986 0.0410959 0.0821918 0.0136986 0.0684932 0.0821918 0.0410959 0 0.0136986 0 0.0136986 0.109589 0.0821918 0.0136986 0 0.0547945 0.0684932 0.0958904 0.0410959 0 0 0 0.0410959 Letter frequencies for subfile #4 of 8 0 0 0.0410959 0.0547945 0.0821918 0.0273973 0 0.0136986 0 0.0136986 0 0.0958904 0.0136986 0.0410959 0.0547945 0.109589 0.0410959 0.0958904 0.0410959 0.109589 0 0 0.0410959 0.0273973 0.0136986 0.0821918 Letter frequencies for subfile #5 of 8 0.0138889 0 0.0416667 0 0.125 0.0277778 0.0555556 0 0.0833333 0.0277778 0.0277778 0.0416667 0.0694444 0 0 0.0694444 0.0277778 0.0416667 0 0.0138889 0 0.0972222 0.0972222 0.0694444 0.0555556 0.0138889 Letter frequencies for subfile #6 of 8 0.0416667 0.0277778 0.0277778 0.0416667 0.166667 0.0277778 0.0555556 0.0277778 0.0972222 0 0.0138889 0.0555556 0.0138889 0.0694444 0.0972222 0 0 0.0416667 0.0833333 0.0416667 0.0277778 0.0277778 0.0138889 0 0 0 Letter frequencies for subfile #7 of 8 0.0555556 0.0138889 0.0416667 0 0 0.0694444 0.0416667 0.0833333 0.0694444 0 0 0.0694444 0.0416667 0.0833333 0.0277778 0 0.0555556 0 0.0277778 0 0.125 0.0138889 0.0138889 0.0138889 0.111111 0.0416667 Letter frequencies for subfile #8 of 8 0.0555556 0.0555556 0.0416667 0.0416667 0.0555556 0.0833333 0 0.0138889 0.0694444 0.0555556 0.0555556 0.0555556 0.0138889 0 0.0416667 0.0277778 0.125 0.0694444 0 0 0 0.0416667 0 0.0833333 0.0138889 0 =============================================================================== I now apply the correlation test to each subfile to find the most likely character used to encipher that subfile. The highest values in each subfile should correspond to the letters of the key (only the highest scores are shown below). ================================= Letter scores for subFile #1 of 8 e 1.11541 t 1.52898 Letter scores for subFile #2 of 8 a 1.42856 Letter scores for subFile #3 of 8 b 1.46715 o 1.13757 Letter scores for subFile #4 of 8 a 1.11542 l 1.41246 Letter scores for subFile #5 of 8 e 1.33297 r 1.14717 Letter scores for subFile #6 of 8 a 1.4799 Letter scores for subFile #7 of 8 h 1.13128 u 1.43888 Letter scores for subFile #8 of 8 x 1.32679 ================================= In fact, the letters with the highest scores indeed form a word: ============= Key: tableaux ============= Knowing the key, we can recover the plaintext: ================================================================================ assemblylanguageprogrammingisnotforeveryoneofcourseitistoofardemandingfortheless exactingprogrammersamongusifyoudontwanttoconcernyourselfwithsuchfinickybutimport antdetailsasavoidingregisterconflictsselectingaddressingmodesorallocatingsafedat aareasthenisuggestyousticktoalanguagelikebasicwhichhandlesthesedetailsforyouifon theotherhandyoureintriguedbythepossibilityofwritingblindinglyfastprogramsandlear ningabouttheinternalstructureofthemacintoshandthemacintoshplusalongthewaybyallme ansgiveassemblylanguageatryitmaytakeawhiletofullyunderstandthelanguagebuttherewa rdsareworththeeffort ================================================================================