Home > Net >  Combine a span of columns in Pandas based on a condition (exclude nans)
Combine a span of columns in Pandas based on a condition (exclude nans)

Time:05-10

Let's say I have a dataframe like so:

   a   b   c   d 
0  S   t   f   nan
1  S   t   t   nan
2  S   f   nan nan
3  Q   t   nan nan

I want to combine the last 3 columns into a single column, as an array, but exclude the nan values, so I end up getting something like the following:

   a   b   c   d   e
0  S   t   f   nan [t, f]
1  S   t   t   nan [t, f]
2  S   f   nan nan [f]
3  Q   t   nan nan [t]

The closest I was able to get was using iloc but I'm unable to apply a conditional to it properly:

df['e'] = df.iloc[:, 1:].values.tolist()

The above results in the arrays having all the column values, including nans.

CodePudding user response:

You could use a nested list comprehension where you use the fact that NaN is not equal to itself to filter out NaNs:

df['e'] = [[x for x in ary if x==x] for ary in df.iloc[:,-3:].to_records(index=False)]

Output:

   a  b    c   d       e
0  S  t    f NaN  [t, f]
1  S  t    t NaN  [t, t]
2  S  f  NaN NaN     [f]
3  Q  t  NaN NaN     [t]

CodePudding user response:

IMHO a bit more readable version:

df['new_col_name'] = df.iloc[:,-3:].apply(lambda ser: ser.dropna().to_list(), axis=1)
  • Related