I have a dataframe like as shown below
df = [{"condition": "a", "runtimes": "1,15,2.0,nan"}, {"condition": "b", "runtimes": "51,75,1.0,NaN"}]
df = pd.DataFrame(df)
print(df)
My objective is to
a) Create a output_list
- which will concat/append all the runtimes
column values and store it as a list
b) output_list
should not contain NA
, nan
and also should not contain duplicates values
c) All the values in the list should be of int
datatype
I was trying something like below
for b in df.runtimes.tolist():
print(type(b))
for a in b.split(','):
print(int(a,base=10)) # it threw error here
ValueError: invalid literal for int() with base 10: '2.0'
I want all the runtimes
value to be in int format (it can only be int data type)
I expect my output to be a python list like as below
output_list = [1,2,15,51,75]
CodePudding user response:
First split values and use Series.explode
, convert to numeric with misisng values if not match, then rmove misisng values, sorting, convert to integers, remove duplicates and last convert to list:
L = (pd.to_numeric(df.runtimes.str.split(',').explode(), errors='coerce')
.dropna()
.sort_values()
.astype(int)
.unique()
.tolist())
print (L)
[1, 2, 15, 51, 75]
Or if possible convert to floats instead pd.to_numeric
:
L = (df.runtimes.str.split(',')
.explode()
.astype(float)
.dropna()
.sort_values()
.astype(int)
.unique()
.tolist())