I have a pandas dataframe like so:
df = pd.DataFrame({'column': [[np.nan, np.nan, np.nan], [1, np.nan, np.nan], [2, 3, np.nan], [3, 2, 1]]})
column
0 [nan, nan, nan]
1 [1, nan, nan]
2 [2, 3, nan]
3 [3, 2, 1]
Note that there is never the same value twice in a row.
I wish to transform this single column into multiple columns named with the corresponding values. So I want to order the values and put them in the right column. The ones under column_1
, twos under column_2
etc.
column_1 column_2 column_3
0 NaN NaN NaN
1 1.0 NaN NaN
2 NaN 2.0 3.0
3 1.0 2.0 3.0
How to do this? I don't really know where to start to be honest.
CodePudding user response:
Using a pivot_table
:
(df['column'].explode().reset_index()
.dropna()
.assign(col=lambda d: 'column_' d['column'].astype(str))
.pivot_table(index='index', columns='col', values='column',
aggfunc='first', dropna=False)
.reindex(df.index)
)
Output:
col column_1 column_2 column_3
0 NaN NaN NaN
1 1.0 NaN NaN
2 NaN 2.0 3.0
3 1.0 2.0 3.0
CodePudding user response:
Use dict comprehension to compute pandas Series of columns:
import math
df = df.apply(lambda row: pd.Series(data={f"column_{v}": v for v in row["column"] if not math.isnan(v)}, dtype="float64"), axis=1)
[Out]:
column_1 column_2 column_3
0 NaN NaN NaN
1 1.0 NaN NaN
2 NaN 2.0 3.0
3 1.0 2.0 3.0