i have a df like this:
id | month |
---|---|
1 | 1 |
1 | 3 |
1 | 4 |
1 | 6 |
i want to transform it become like this:
id | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|
1 | 1 | 0 | 1 | 1 | 0 | 1 |
ive tried using this code:
ndf = df[['id']].join(pd.get_dummies(
df['month'])).groupby('id').max()
but it shows like this:
id | 1 | 3 | 4 | 6 |
---|---|---|---|---|
1 | 1 | 1 | 1 | 1 |
how can i insert the middle column (2 and 5) even if it's not in the data?
CodePudding user response:
You can use pd.crosstab
instead, then create new columns using pd.RangeIndex
based on the min and max month
, and finally use DataFrame.reindex
(and optionally DataFrame.reset_index
afterwards):
import pandas as pd
new_cols = pd.RangeIndex(df['month'].min(), df['month'].max())
res = (
pd.crosstab(df['id'], df['month'])
.reindex(columns=new_cols, fill_value=0)
.reset_index()
)
Output:
>>> res
id 1 2 3 4 5
0 1 1 0 1 1 0