Home > OS >  Convert all alpha characters of string to integers in separate columns within a pandas dataframe
Convert all alpha characters of string to integers in separate columns within a pandas dataframe

Time:12-23

I have a single column of strings that contain alpha numeric characters as follows:

AA128A
AA128B
AA128C
AA128D
AA128E
AA129A
AA129B
AA129C
CP100-10
CP100-11
CP100-12
CP100-13
CORSTG11A
CORSTG11B
CORSTG11C

I'm wanting to explode each individual character into separate columns and convert all alpha characters into their ASCII decimal value and retain the numeric values as they are. If the value is null after exploding the values, I want to replace it with -1.

I have been able to explode the values and replace nulls, however when I attempt to iterate over the values with the ord() function to convert the alpha characters, I get the error:

ord() expected string of length 1, but int found

Even if I create conditional analysis on the datatype within a for loop.

import numpy as np 
import pandas as pd 
from sklearn.preprocessing import OrdinalEncoder
from pandas.api.types import is_string_dtype
from pandas.api.types import is_numeric_dtype
loc_df = pd.read_csv('C:\\path\\to\\file.csv',index_col=False)
# new data frame with split value columns 
explode_df = loc_df["stoloc"].apply(lambda x: pd.Series(list(x)))
explode_df = explode_df.fillna(-1)
#Convert alpha characters to numeric
for char in explode_df:
    if is_string_dtype(explode_df[char]):
        explode_df_numeric[char] = ord(char)
    else:
        explode_df_numeric[char] = char

expected outcome

CodePudding user response:

The reason you got that error is that the variable char is the column name, and that's not the right arg for ord. You should pass the values in that column instead; you can use apply or map for that.

        if is_string_dtype(explode_df[char]):
            explode_df[char] = explode_df[char].apply(ord)
        else:
            explode_df[char] = explode_df[char]

But there are other issues in your code. A for loop over the columns and checking the type of the column does not solve the problem because there are columns that contain both strings and integers. A simple solution is an applymap with is_int check:

def is_int(s):
    try: 
        int(s)
        return True
    except:
        return False

# new data frame with split value columns 
explode_df = loc_df["stoloc"].apply(list).apply(pd.Series)
explode_df = explode_df.fillna(-1)
explode_df_numeric = explode_df.applymap(lambda x: x if is_int(x) else ord(x))
  • Related