Home > Mobile >  Tricky apply difference from two dataframes in a specific column using Python
Tricky apply difference from two dataframes in a specific column using Python

Time:10-13

I would like to compare the sum of an original df and rounded df. If there is a delta from its sum, apply this delta, whether by subtraction or addition to the last quarter.

The first sum difference between AA is 4. (12-8 = 4) The second sum difference with BB is 2. (14-12 = 2)

Data

original_df

id  q121 q221 q321 q421 sum
AA  1    0.5  0.5  6.1  8
BB  1    0.5  6.5  3.1  12
                    

rounded_df

id  q121 q221 q321 q421 sum
AA  2    2    2    6    12
BB  2    2    6    4    14
                    
                                            
            

Desired

We've subtracted 4 from 12 to obtain 8 for AA.
We've subtracted 2 from 14 to obtain 12 for BB
(when comparing original to rounded)    

Now the new final_df matches the sum of the original_df

final_df

id  q121 q221 q321 q421 sum delta
AA  2    2    2    2    8   4
BB  2    2    6    2    12  2

Doing

  1. Compare sum and create delta

    final_df['delta'] = np.where(original_df['sum'] == 
    rounded_df['sum'], 0, original_df['sum'] - rounded_df['sum'])
    
  2. Apply delta to last quarter of the year

I am still not sure how to complete thee second half of the problem. I am still researching, any suggestion is appreciated.

CodePudding user response:

using sub, filter, update, iloc

# create the delta with difference b/w sum of the two DF
df2['delta']=df2['sum'].sub(df['sum'])

# subtract the delta from the last quarter, obtained
# using filter
# create a placeholder df3
df3=df2.filter(like='q' ).iloc[:,-1:].sub(df2.iloc[:,-1:].values)

# filter(like='q' ) : Filter columns that has 'q' in their name
# .iloc[:,-1:] : using iloc, choose the last columm that has 'q' in their name, -1 gives us last column
# df2.[iloc][1][:,-1:].values : gives the values of the last column of the table
# the subtraction results in DF3, 



# update df2 based on df3
df2.update(df3)
df2

# updates will update the values of matching column from df3 into df2



    id  q121    q221    q321    q421    sum     delta
0   AA     2       2       2       2      12      4
1   BB     2       2       6       2      14      2
  • Related