Home > OS >  How to insert a 1 row into a dataframe for every ID
How to insert a 1 row into a dataframe for every ID

Time:09-29

I have df that contains purchase data. Each row has a different item in the item column, but naturally the purchase id's in purchaseId stays the same across the same purchase. How can I insert a row for each purchaseId that contains the value in the age column as seen in df2 but in the item column?

df

purchaseId     item      age  
    22         apples    35
    22         coffee    35
    22         wipes     35
    53        tomatoes   23
    53         sugar     23
    53         tea       23


df2

purchaseId     item      age  
    22         apples    35
    22         coffee    35
    22         wipes     35
    22         35        35
    53        tomatoes   23
    53         sugar     23
    53         tea       23
    53         23        23

CodePudding user response:

Use concat with new DataFrame created by DataFrame.drop_duplicates and last sorting index by DataFrame.sort_index:

#necesary default index
#df = df.reset_index(drop=True)

df2 = (pd.concat([df, 
                  df.drop_duplicates('purchaseId', keep='last')
                    .assign(item = lambda x: x['age'])])
         .sort_index(kind='mergesort', ignore_index=True))
print (df2)
   purchaseId      item  age
0          22    apples   35
1          22    coffee   35
2          22     wipes   35
3          22        35   35
4          53  tomatoes   23
5          53     sugar   23
6          53       tea   23
7          53        23   23

CodePudding user response:

import pandas as pd
df = pd.DataFrame({'purchaseID':[22,22,22,53,53,53],
               'item':['apple','coffe','wipes',
                       'tomatoes','sugar','tea'],
               'age':[35,35,35,23,23,23]})

df = df.append(df.groupby(['purchaseID','age'], as_index=False)['age'].max(), ignore_index=True).sort_values(by='purchaseID')
df['item'].fillna(df['age'], inplace=True)
  • Related