I have a CSV file that has various repeating elements. Below is an example of the table.
id code date
314asre B1 20201010
343asr B2 20200405
150ase B1 20210615
314asre B1 20210506
343asr B3 20220408
I want to create something like this:
id | B1 | B2 | B3 |
---|---|---|---|
314asre | 20201010 | NULL | NULL |
343asr | NULL | 20200405 | NULL |
150ase | 20210615 | NULL | NULL |
314asre | 20210506 | NULL | NULL |
343asr | NULL | NULL | 20220408 |
I tried using pivot and pivot table, but I got an error saying duplicate values in the index for the pivot, and pivot_table removes the duplicate values completely. I want to keep the duplicate values as shown in the above table. Any help will be appreciated.
Code I have tried:
df.pivot_table(index='id' , columns = 'code' , values= 'date')
df.reset_index(inplace=True)
CodePudding user response:
Make pivot table with random index and change index to id column as follows.
df['temp'] = range(len(df))
result = df.pivot_table(index= 'temp', columns = 'code' , values= 'date')
result.index = df['id']
result
the result is as follows.
code B1 B2 B3
id
314asre 20201010.0 NaN NaN
343asr NaN 20200405.0 NaN
150ase 20210615.0 NaN NaN
314asre 20210506.0 NaN NaN
343asr NaN NaN 20220408.0