I have df
that contains purchase data. Each row has a different item in the item
column, but naturally the purchase id's in purchaseId
stays the same across the same purchase. How can I insert a row for each purchaseId
that contains the value in the age
column as seen in df2
but in the item
column?
df
purchaseId item age
22 apples 35
22 coffee 35
22 wipes 35
53 tomatoes 23
53 sugar 23
53 tea 23
df2
purchaseId item age
22 apples 35
22 coffee 35
22 wipes 35
22 35 35
53 tomatoes 23
53 sugar 23
53 tea 23
53 23 23
CodePudding user response:
Use concat
with new DataFrame
created by DataFrame.drop_duplicates
and last sorting index by DataFrame.sort_index
:
#necesary default index
#df = df.reset_index(drop=True)
df2 = (pd.concat([df,
df.drop_duplicates('purchaseId', keep='last')
.assign(item = lambda x: x['age'])])
.sort_index(kind='mergesort', ignore_index=True))
print (df2)
purchaseId item age
0 22 apples 35
1 22 coffee 35
2 22 wipes 35
3 22 35 35
4 53 tomatoes 23
5 53 sugar 23
6 53 tea 23
7 53 23 23
CodePudding user response:
import pandas as pd
df = pd.DataFrame({'purchaseID':[22,22,22,53,53,53],
'item':['apple','coffe','wipes',
'tomatoes','sugar','tea'],
'age':[35,35,35,23,23,23]})
df = df.append(df.groupby(['purchaseID','age'], as_index=False)['age'].max(), ignore_index=True).sort_values(by='purchaseID')
df['item'].fillna(df['age'], inplace=True)