I have this
id phone1 phone2
1 300 301
1 303 300
1 300 303
2 400 401
Want this
id phone1 phone2 phone3
1 300 303 301
2 400 401
I have tried group by id and column phone1, apply count function, iterate over it adding to a list verifying if is already there the id and phone and sum the third column, and do the same thing with phone2 in the same list
After it reorganize the dataframe iterating the list but this is so slow with the millions of data that i have to proccess
dataframe1 = dataframe.groupby(['id', 'phone1']).count().reset_index()
dataframe2 = dataframe.groupby(['id', 'phone2']).count().reset_index()
result to add in a list
id phone1 phone2
1 300 2
1 303 1
2 401 1
id phone1 phone2
1 300 1
1 301 1
1 303 1
2 400 1
CodePudding user response:
Iterate a dataframe is so slow and not recomended
You can group the phones and apply the list function for each id, after it organize by duplicates and split in new columns
CodePudding user response:
You can melt
to reshape the phone columns to rows, then remove the duplicates per group. Finally, pivot
to reshape back to wide format.
out = (df
.melt('id')
.drop_duplicates(['id', 'value'])
.assign(col=lambda d: d.groupby('id').cumcount().add(1))
.pivot_table(index='id', columns='col', values='value', fill_value=pd.NA)
.astype('Int64') # optional
.add_prefix('phone')
.rename_axis(columns=None).reset_index()
)
output:
id phone1 phone2 phone3
0 1 300 303 301
1 2 400 401 <NA>