Home > Net >  Panda DataFrame: How to Duplicate Rows while changing the value of a column
Panda DataFrame: How to Duplicate Rows while changing the value of a column

Time:03-02

I have the following dataframe which list certain transactions

df1

ProductId Code Cost
1         101  £1000
1         101  £2000
2         101  £1500
2         101  £1000

However I need to change the data to a lower level of data. And the information of this is found in df2

df2

ProductId Code2 Percent
1         201   0.25
1         301   0.75
2         201   0.60
2         301   0.40

Therefore, the resulting dataframe should look like this:

df1

ProductId Code Cost
1         201  £250
1         301  £750
1         201  £500
1         301  £1500
2         201  £900
2         301  £600
2         201  £600
2         301  £400

The resulting dataframe will need have the rows replicate to accomodate the lower level code. (I.e every ProductId = 1 will have code 101 broken into 201 and 301 with the costs being adjust per the percentage change).

I have started off by replicating the rows using pandas concat:

df_new = pd.concat([df1]*2, ignore_index=True)

However, I am struggling to figure a way to effectively change the data to:

  1. Change the code (to 201 and 301 for each ProductId)
  2. Change the cost (as per the percentage)

Any guidance would be greatly appreciated. Thanks

CodePudding user response:

You can do merge

out = df1.merge(df2,on = 'ProductId').assign(Cost = lambda x : x['Cost']*x['Percent'])
Out[509]: 
   ProductId  Code    Cost  Code2  Percent
0          1   101   250.0    201     0.25
1          1   101   750.0    301     0.75
2          1   101   500.0    201     0.25
3          1   101  1500.0    301     0.75
4          2   101   900.0    201     0.60
5          2   101   600.0    301     0.40
6          2   101   600.0    201     0.60
7          2   101   400.0    301     0.40
  • Related