I have
import pandas as pd
import numpy as np
df = pd.DataFrame({"x": ["red", "blue", np.nan, np.nan, np.nan, np.nan, np.nan, ],
"y": [np.nan, np.nan, np.nan, 'cold', 'warm', np.nan, np.nan, ],
"z": [np.nan, np.nan, np.nan, np.nan, np.nan, 'charm', 'strange'],
}).astype("category")
giving
x y z
0 red NaN NaN
1 blue NaN NaN
2 NaN NaN NaN
3 NaN cold NaN
4 NaN warm NaN
5 NaN NaN charm
6 NaN NaN strange
I would like to add a new categorical column with unordered values red,blue,hot,cold,warm, charm, strange, filled in appropriately. I have many such columns, not just three.
Some possiblities:
astype(str)
and concatenating and then re-creating a categorical- creating a new categorical type using
union_categoricals
and then cast each column to that type? and then seriallyfillna()
them?
I can't make those or anything else work.
Notes:
using .astype(pd.CategoricalDtype(ordered=True))
in place of .astype("category")
in defining df
also works with the answer below.
CodePudding user response:
Edited answer
As specified by the OP, in case there are rows where all values are np.NaN
we could try the following solution:
df['new_col'] = df.dropna(how='all').apply(lambda x: x.loc[x.first_valid_index()], axis=1)
df['new_col'] = pd.Categorical(df.new_col)
df
x y z new_col
0 red NaN NaN red
1 blue NaN NaN blue
2 NaN NaN NaN NaN
3 NaN cold NaN cold
4 NaN warm NaN warm
5 NaN NaN charm charm
6 NaN NaN strange strange