Home > Software engineering >  Compile pandas column multiple str values as output list and exclude NA
Compile pandas column multiple str values as output list and exclude NA

Time:05-06

I have a dataframe like as shown below

df = [{"condition": "a", "runtimes": "1,15,2.0,nan"}, {"condition": "b", "runtimes": "51,75,1.0,NaN"}]
df = pd.DataFrame(df)
print(df)

My objective is to

a) Create a output_list - which will concat/append all the runtimes column values and store it as a list

b) output_list should not contain NA, nan and also should not contain duplicates values

c) All the values in the list should be of int datatype

I was trying something like below

for b in df.runtimes.tolist():
    print(type(b))
    for a in b.split(','):
        print(int(a,base=10))  # it threw error here

ValueError: invalid literal for int() with base 10: '2.0'

I want all the runtimes value to be in int format (it can only be int data type)

I expect my output to be a python list like as below

output_list = [1,2,15,51,75]

CodePudding user response:

First split values and use Series.explode, convert to numeric with misisng values if not match, then rmove misisng values, sorting, convert to integers, remove duplicates and last convert to list:

L = (pd.to_numeric(df.runtimes.str.split(',').explode(), errors='coerce')
       .dropna()
       .sort_values()
       .astype(int)
       .unique()
       .tolist())
print (L)
[1, 2, 15, 51, 75]
        

Or if possible convert to floats instead pd.to_numeric:

L = (df.runtimes.str.split(',')
                .explode()
                .astype(float)
                .dropna()
                .sort_values()
                .astype(int)
                .unique()
                .tolist())
  • Related