Home > database >  Convert list into columns by matching values
Convert list into columns by matching values

Time:11-17

I have a pandas dataframe like so:

df = pd.DataFrame({'column': [[np.nan, np.nan, np.nan], [1, np.nan, np.nan], [2, 3, np.nan], [3, 2, 1]]})

column
0   [nan, nan, nan]
1   [1, nan, nan]
2   [2, 3, nan]
3   [3, 2, 1]

Note that there is never the same value twice in a row.

I wish to transform this single column into multiple columns named with the corresponding values. So I want to order the values and put them in the right column. The ones under column_1, twos under column_2 etc.

    column_1    column_2    column_3
0   NaN NaN NaN
1   1.0 NaN NaN
2   NaN 2.0 3.0
3   1.0 2.0 3.0

How to do this? I don't really know where to start to be honest.

CodePudding user response:

Using a pivot_table:

(df['column'].explode().reset_index()
 .dropna()
 .assign(col=lambda d: 'column_' d['column'].astype(str))
 .pivot_table(index='index', columns='col', values='column',
              aggfunc='first', dropna=False)
 .reindex(df.index)
)

Output:

col  column_1  column_2  column_3
0         NaN       NaN       NaN
1         1.0       NaN       NaN
2         NaN       2.0       3.0
3         1.0       2.0       3.0

CodePudding user response:

Use dict comprehension to compute pandas Series of columns:

import math
df = df.apply(lambda row: pd.Series(data={f"column_{v}": v for v in row["column"] if not math.isnan(v)}, dtype="float64"), axis=1)

[Out]:
   column_1  column_2  column_3
0       NaN       NaN       NaN
1       1.0       NaN       NaN
2       NaN       2.0       3.0
3       1.0       2.0       3.0
  • Related