Compile pandas column multiple str values as output list and exclude NA-CodePudding

I have a dataframe like as shown below

df = [{"condition": "a", "runtimes": "1,15,2.0,nan"}, {"condition": "b", "runtimes": "51,75,1.0,NaN"}]
df = pd.DataFrame(df)
print(df)

My objective is to

a) Create a output_list - which will concat/append all the runtimes column values and store it as a list

b) output_list should not contain NA, nan and also should not contain duplicates values

c) All the values in the list should be of int datatype

I was trying something like below

for b in df.runtimes.tolist():
    print(type(b))
    for a in b.split(','):
        print(int(a,base=10))  # it threw error here

ValueError: invalid literal for int() with base 10: '2.0'

I want all the runtimes value to be in int format (it can only be int data type)

I expect my output to be a python list like as below

output_list = [1,2,15,51,75]

CodePudding user response：

First split values and use Series.explode, convert to numeric with misisng values if not match, then rmove misisng values, sorting, convert to integers, remove duplicates and last convert to list:

L = (pd.to_numeric(df.runtimes.str.split(',').explode(), errors='coerce')
       .dropna()
       .sort_values()
       .astype(int)
       .unique()
       .tolist())
print (L)
[1, 2, 15, 51, 75]

Or if possible convert to floats instead pd.to_numeric:

L = (df.runtimes.str.split(',')
                .explode()
                .astype(float)
                .dropna()
                .sort_values()
                .astype(int)
                .unique()
                .tolist())