I have a single column of strings that contain alpha numeric characters as follows:
AA128A
AA128B
AA128C
AA128D
AA128E
AA129A
AA129B
AA129C
CP100-10
CP100-11
CP100-12
CP100-13
CORSTG11A
CORSTG11B
CORSTG11C
I'm wanting to explode each individual character into separate columns and convert all alpha characters into their ASCII decimal value and retain the numeric values as they are. If the value is null after exploding the values, I want to replace it with -1.
I have been able to explode the values and replace nulls, however when I attempt to iterate over the values with the ord() function to convert the alpha characters, I get the error:
ord() expected string of length 1, but int found
Even if I create conditional analysis on the datatype within a for loop.
import numpy as np
import pandas as pd
from sklearn.preprocessing import OrdinalEncoder
from pandas.api.types import is_string_dtype
from pandas.api.types import is_numeric_dtype
loc_df = pd.read_csv('C:\\path\\to\\file.csv',index_col=False)
# new data frame with split value columns
explode_df = loc_df["stoloc"].apply(lambda x: pd.Series(list(x)))
explode_df = explode_df.fillna(-1)
#Convert alpha characters to numeric
for char in explode_df:
if is_string_dtype(explode_df[char]):
explode_df_numeric[char] = ord(char)
else:
explode_df_numeric[char] = char
CodePudding user response:
The reason you got that error is that the variable char
is the column name, and that's not the right arg for ord
. You should pass the values in that column instead; you can use apply
or map
for that.
if is_string_dtype(explode_df[char]):
explode_df[char] = explode_df[char].apply(ord)
else:
explode_df[char] = explode_df[char]
But there are other issues in your code. A for loop over the columns and checking the type of the column does not solve the problem because there are columns that contain both strings and integers. A simple solution is an applymap with is_int
check:
def is_int(s):
try:
int(s)
return True
except:
return False
# new data frame with split value columns
explode_df = loc_df["stoloc"].apply(list).apply(pd.Series)
explode_df = explode_df.fillna(-1)
explode_df_numeric = explode_df.applymap(lambda x: x if is_int(x) else ord(x))