I am working with the R programming language.
Consider the following puzzle:
I always wondered if the following was possible - suppose I have this data:
id = c(1,2,3)
names = c("john", "tim", "alex")
country = c("Canada", "England", "Mexico")
my_data = data.frame(id, names, country)
id names country
1 1 john Canada
2 2 tim England
3 3 alex Mexico
I always wondered if it is possible to make a "puzzle" out of this data. For example, is it possible to replace the names and countries with random letters/numbers, and then record what the "conversion key" for this replacement? For example:
# output 1: sample output
id names country
1 1 h37ubL 998ggg6ppggg6h8ggg6
# output 2: sample conversion key
original coded
1 j h3
2 o 7
3 h ub
4 n L
5 C 998
6 a ggg6
7 n pp
8 a ggg6
9 d h8
10 a ggg6
Is something like this possible in R - to create a "key" for all possible characters (e.g. letters (capital, small), numbers, accents, dashes, commas, special symbols, etc.) with the exception of spaces? (i.e. a space in the original should be preserved)
Thanks!
CodePudding user response:
It's not difficult to program a substitution cypher in R. Shuffle the alphabet, in the example below the printable ASCII characters, and keep this as the key. Don't forget to not shuffle the space character, ASCII character nr 32.
Then the same function can encrypt and decrypt by matching the characters in the message with the alphabet and use this match to index the cypher key. Decryption just swaps key
and alphabet
.
The code below uses two functions posted to R-bloggers to get the ASCII characters.
1. The alphabet
asc <- function(x) { strtoi(charToRaw(x), 16L) }
chr <- function(n) { rawToChar(as.raw(n)) }
ascii_printable <- 32:126
ascii_chars <- sapply(ascii_printable, chr)
data.frame(number = 32:126, char = ascii_chars)
#> number char
#> 1 32
#> 2 33 !
#> 3 34 "
#> 4 35 #
#> 5 36 $
#> 6 37 %
#> 7 38 &
#> 8 39 '
#> 9 40 (
#> 10 41 )
#> 11 42 *
#> 12 43
#> 13 44 ,
#> 14 45 -
#> 15 46 .
#> 16 47 /
#> 17 48 0
#> 18 49 1
#> 19 50 2
#> 20 51 3
#> 21 52 4
#> 22 53 5
#> 23 54 6
#> 24 55 7
#> 25 56 8
#> 26 57 9
#> 27 58 :
#> 28 59 ;
#> 29 60 <
#> 30 61 =
#> 31 62 >
#> 32 63 ?
#> 33 64 @
#> 34 65 A
#> 35 66 B
#> 36 67 C
#> 37 68 D
#> 38 69 E
#> 39 70 F
#> 40 71 G
#> 41 72 H
#> 42 73 I
#> 43 74 J
#> 44 75 K
#> 45 76 L
#> 46 77 M
#> 47 78 N
#> 48 79 O
#> 49 80 P
#> 50 81 Q
#> 51 82 R
#> 52 83 S
#> 53 84 T
#> 54 85 U
#> 55 86 V
#> 56 87 W
#> 57 88 X
#> 58 89 Y
#> 59 90 Z
#> 60 91 [
#> 61 92 \\
#> 62 93 ]
#> 63 94 ^
#> 64 95 _
#> 65 96 `
#> 66 97 a
#> 67 98 b
#> 68 99 c
#> 69 100 d
#> 70 101 e
#> 71 102 f
#> 72 103 g
#> 73 104 h
#> 74 105 i
#> 75 106 j
#> 76 107 k
#> 77 108 l
#> 78 109 m
#> 79 110 n
#> 80 111 o
#> 81 112 p
#> 82 113 q
#> 83 114 r
#> 84 115 s
#> 85 116 t
#> 86 117 u
#> 87 118 v
#> 88 119 w
#> 89 120 x
#> 90 121 y
#> 91 122 z
#> 92 123 {
#> 93 124 |
#> 94 125 }
#> 95 126 ~
Created on 2022-11-13 with reprex v2.0.2
2. The cypher function and table
Note that the space is kept as the first character in the table below, it is not shuffled with sample
.
encrypt <- function(x, key, alphabet = ascii_chars) {
s <- strsplit(x, "")
sapply(s, \(y) {
i <- match(y, alphabet)
paste(key[i], collapse = "")
})
}
set.seed(2022)
coded <- c(" ", sample(ascii_chars[-1]))
data.frame(number = 32:126, char = ascii_chars, coded)
#> number char coded
#> 1 32
#> 2 33 ! S
#> 3 34 " n
#> 4 35 # W
#> 5 36 $ k
#> 6 37 % d
#> 7 38 & &
#> 8 39 ' _
#> 9 40 ( }
#> 10 41 ) .
#> 11 42 * '
#> 12 43 |
#> 13 44 , e
#> 14 45 - !
#> 15 46 . ~
#> 16 47 / @
#> 17 48 0 a
#> 18 49 1 =
#> 19 50 2 ,
#> 20 51 3 \\
#> 21 52 4 L
#> 22 53 5 f
#> 23 54 6 x
#> 24 55 7 6
#> 25 56 8 s
#> 26 57 9
#> 27 58 : 4
#> 28 59 ; ^
#> 29 60 < #
#> 30 61 = w
#> 31 62 > B
#> 32 63 ? (
#> 33 64 @ M
#> 34 65 A J
#> 35 66 B G
#> 36 67 C A
#> 37 68 D ?
#> 38 69 E 3
#> 39 70 F X
#> 40 71 G r
#> 41 72 H Z
#> 42 73 I U
#> 43 74 J R
#> 44 75 K I
#> 45 76 L T
#> 46 77 M c
#> 47 78 N p
#> 48 79 O *
#> 49 80 P l
#> 50 81 Q C
#> 51 82 R :
#> 52 83 S g
#> 53 84 T i
#> 54 85 U 0
#> 55 86 V 7
#> 56 87 W [
#> 57 88 X %
#> 58 89 Y z
#> 59 90 Z "
#> 60 91 [ 8
#> 61 92 \\ D
#> 62 93 ] 1
#> 63 94 ^ m
#> 64 95 _ /
#> 65 96 ` 2
#> 66 97 a K
#> 67 98 b ]
#> 68 99 c -
#> 69 100 d v
#> 70 101 e y
#> 71 102 f Y
#> 72 103 g P
#> 73 104 h q
#> 74 105 i `
#> 75 106 j h
#> 76 107 k Q
#> 77 108 l 9
#> 78 109 m O
#> 79 110 n ;
#> 80 111 o u
#> 81 112 p )
#> 82 113 q j
#> 83 114 r {
#> 84 115 s V
#> 85 116 t H
#> 86 117 u $
#> 87 118 v o
#> 88 119 w 5
#> 89 120 x F
#> 90 121 y >
#> 91 122 z N
#> 92 123 { b
#> 93 124 | E
#> 94 125 } t
#> 95 126 ~ <
Created on 2022-11-13 with reprex v2.0.2
3. Coding the data
test_with_spaces <- "hello world!"
(e <- encrypt(test_with_spaces, coded, ascii_chars))
#> [1] "qy99u 5u{9vS"
encrypt(e, ascii_chars, coded)
#> [1] "hello world!"
my_data[-1] <- lapply(my_data[-1], encrypt, key = coded, alphabet = ascii_chars)
my_data
#> id names country
#> 1 1 huq; AK;KvK
#> 2 2 H`O 3;P9K;v
#> 3 3 K9yF cyF`-u
my_data[-1] <- lapply(my_data[-1], encrypt, key = ascii_chars, alphabet = coded)
my_data
#> id names country
#> 1 1 john Canada
#> 2 2 tim England
#> 3 3 alex Mexico
Created on 2022-11-13 with reprex v2.0.2
4. An alternative key
An alternative to enumerating the entire key table is to have a number (a key) set the seed of the pseudo-RNG. After set.seed(key)
the function would then generate the vector coded
from alphabet
. To adapt encrypt
to use this variable is straightforward.