I have a data frame (df1) as shown below. The elements in the 1st column start repeating after a certain number but the corresponding 2nd column elements are unique
0 1
0 A Ok
1 B 1234
2 C True
3 D Yes
4 A Ok
5 B 6789
6 C False
7 D No
I have created a new DataFrame(df2) with column names as the elements in the 1st column of df1 without repetition
column_names = df1[0].drop_duplicates()
df2 = pd.DataFrame(columns=column_names)
I want to copy the elements in the 2nd column of df1 under the column names in df2 as shown
A B C D
0 Ok 1234 True Yes
1 Ok 6789 False Yes
2 Ok 1567 False N/A
Can anyone help me with this?
CodePudding user response:
group the repeating values in the first column with cumcount and then use that in the pivot.
# create a temp idx, which is cumcount of unique values in rows, and then use that with pivot
# finally reset index and drop the temporary created column
(df.assign(idx=df.groupby([0]).cumcount())
.pivot(index='idx', columns=0, values=1)
.reset_index()
.drop(columns='idx')
.sort_index(axis=1)
)
0 A B C D
0 Ok 1234 True Yes
1 Ok 6789 False No
Alternately, i modified the provided data and here the order of the column is preserved
data = {0: {0: 'Z', 1: 'B', 2: 'X', 3: 'D', 4: 'Z', 5: 'B', 6: 'X', 7: 'D'},
1: {0: 'Ok', 1: '1234', 2: 'True', 3: 'Yes', 4: 'Ok', 5: '6789', 6: 'False', 7: 'No'}}
df=pd.DataFrame(data)
(df.assign(idx=df.groupby([0]).cumcount())
.groupby(['idx',0],sort=False )[1].sum().unstack()
.reset_index()
.drop(columns='idx')
)
Z B X D
0 Ok 1234 True Yes
1 Ok 6789 False No