I have either a numpy or pandas dataframe that contains on most cells numerical values, on the other hand there are spare character values (they are not column based so I cant use label encoder). I am searching for a method to convert these sparse character values that could be anywhere, into their ASCII code, in order to feed the array in deep learning models. After that I need to know which ones are the ones that was converted so I could reconvert them back to characters. Any idea would be highly appreciated!
Example values could be (1,2,f,5,3) on row 1 and (7,k,1,j,9) on some row k. This in a numpy array or in a pandas dataframe. Question is how can I encode the letters to ascii in order to have numbers, then how do I decode them back?
CodePudding user response:
A possible solution could be to use ord()
and chr()
to encode and decode your characters using "an integer representing the Unicode code point of that character".
>>> df
characters
0 f
1 k
>>> df["encoded"] = df["characters"].apply(ord)
>>> df["encoded"]
0 102
1 107
>>> df["decoded"] = df["encoded"].apply(chr)
>>> df["decoded"]
0 f
1 k