Copy corresponding dataframe column values to new dataframe with fixed column names python-CodePudding

I have a data frame (df1) as shown below. The elements in the 1st column start repeating after a certain number but the corresponding 2nd column elements are unique

    0        1
0   A        Ok
1   B        1234
2   C        True
3   D        Yes
4   A        Ok
5   B        6789
6   C        False
7   D        No

I have created a new DataFrame(df2) with column names as the elements in the 1st column of df1 without repetition

column_names = df1[0].drop_duplicates()
df2 = pd.DataFrame(columns=column_names)

I want to copy the elements in the 2nd column of df1 under the column names in df2 as shown

    A     B       C      D
0   Ok   1234    True    Yes
1   Ok   6789    False   Yes
2   Ok   1567    False   N/A

Can anyone help me with this?

CodePudding user response：

group the repeating values in the first column with cumcount and then use that in the pivot.

# create a temp idx, which is cumcount of unique values in rows, and then use that with pivot
# finally reset index and drop the temporary created column

(df.assign(idx=df.groupby([0]).cumcount())
 .pivot(index='idx', columns=0, values=1)
 .reset_index()
 .drop(columns='idx')
 .sort_index(axis=1)
)

0    A     B       C     D
0   Ok  1234    True    Yes
1   Ok  6789    False   No

Alternately, i modified the provided data and here the order of the column is preserved

data = {0: {0: 'Z', 1: 'B', 2: 'X', 3: 'D', 4: 'Z', 5: 'B', 6: 'X', 7: 'D'},
        1: {0: 'Ok',  1: '1234',  2: 'True',  3: 'Yes',  4: 'Ok',  5: '6789',  6: 'False',  7: 'No'}}
df=pd.DataFrame(data)

(df.assign(idx=df.groupby([0]).cumcount())
 .groupby(['idx',0],sort=False )[1].sum().unstack() 
 .reset_index()
 .drop(columns='idx')
)

     Z     B       X     D
0   Ok  1234    True    Yes
1   Ok  6789    False   No