Let's say I have a dataframe like so:
a b c d
0 S t f nan
1 S t t nan
2 S f nan nan
3 Q t nan nan
I want to combine the last 3 columns into a single column, as an array, but exclude the nan values, so I end up getting something like the following:
a b c d e
0 S t f nan [t, f]
1 S t t nan [t, f]
2 S f nan nan [f]
3 Q t nan nan [t]
The closest I was able to get was using iloc
but I'm unable to apply a conditional to it properly:
df['e'] = df.iloc[:, 1:].values.tolist()
The above results in the arrays having all the column values, including nans.
CodePudding user response:
You could use a nested list comprehension where you use the fact that NaN is not equal to itself to filter out NaNs:
df['e'] = [[x for x in ary if x==x] for ary in df.iloc[:,-3:].to_records(index=False)]
Output:
a b c d e
0 S t f NaN [t, f]
1 S t t NaN [t, t]
2 S f NaN NaN [f]
3 Q t NaN NaN [t]
CodePudding user response:
IMHO a bit more readable version:
df['new_col_name'] = df.iloc[:,-3:].apply(lambda ser: ser.dropna().to_list(), axis=1)