Home > database >  Do an operation only if values from same column of two dataframes are the same
Do an operation only if values from same column of two dataframes are the same

Time:11-13

I have a func_df with 4 functions:

   x        y1    y2        y3        y4
0   -20.0 -0.839071  10.0  0.816164 -8795.000
1   -19.9 -0.865213   9.9  0.994372 -8667.619
2   -19.8 -0.889191   9.8  1.162644 -8541.472
3   -19.7 -0.910947   9.7  1.319299 -8416.553
4   -19.6 -0.930426   9.6  1.462772 -8292.856
..    ...       ...   ...       ...       ...
395  19.5 -0.947580   9.5  1.591630  6659.375
396  19.6 -0.930426   9.6  1.462772  6766.216
397  19.7 -0.910947   9.7  1.319299  6874.193
398  19.8 -0.889191   9.8  1.162644  6983.312
399  19.9 -0.865213   9.9  0.994372  7093.579

And a test_df with scatter points:

       x            y
0   -6.2     0.360801
1    6.4    -3.655422
2  -17.6 -6065.659700
3   -1.5    -3.247304
4  -17.7    -0.785430
..   ...          ...
95   1.6     3.722551
96  16.3    -1.067487
97 -13.3     1.857445
98  -3.8    -0.008831
99 -13.2     1.294064

I want to find the deviation(distance) between all the scatter points and the 4 functions when the x-value is the same on both data frames.

There are some scatter points with same x-value and different y-value.

Edit: A quick example:

Starting with column y1 from func_df:

1st value is x = -20.0 , y1 = -0.839071.

I want the program to search if there a row in which x = -20.0 in test_df and if so, then find the difference between the y-value of that row and the y-value of func_df, which is -0.839071.

Imagine that in test_df there is a row with x = -20, y = -1. Then what I want is abs(-1 - 0.839071). I used abs() because the distance has to be a positive value

This was for the row 0 of column y1. I need it for all rows and also for y2, y3 and y4 of func_df.


I tried something like this:

if test_df.x.equals(func.x):

    result_df = func.iloc[:, 1:5].apply(lambda cell: cell - test_df.y[cell.index])

But honestly was a shot in the dark, no idea what I'm doing.

CodePudding user response:

If the values of 'x' in test_df are unique you could merge the two dataframes on 'x'

merged_df = pandas.merge(test_df, func_df, on='x')
abs_delta_y1 = (merged_df['y'] - merged_df['y1']).abs()

etc...

CodePudding user response:

create sample data:

func_df=pd.DataFrame(data={'x':[-20.9,-20.8,-20.7,-20.6],'y1':[-0.12,-0.021,-0.04,-0.91],
                      'y2':[10.0,9.9,9.8,9.7],'y3':[0.99437,1.162644,1.319299,1.462772],
                      'y4':[-8667.619,-8541.472,-8416.553,-8292.856]})
print(func_df)
'''

      x     y1    y2        y3        y4
0 -20.9 -0.120  10.0  0.994370 -8667.619
1 -20.8 -0.021   9.9  1.162644 -8541.472
2 -20.7 -0.040   9.8  1.319299 -8416.553
3 -20.6 -0.910   9.7  1.462772 -8292.856
'''
test_df=pd.DataFrame(data={'x':[-20.9,-15.2],'y':[0.360801,-3.655422]})
print(test_df)
'''
      x         y
0 -20.9  0.360801
1 -15.2 -3.655422
'''

as you can see the first rows match in both dfs. Now let's combine these two df's. Use this to get only matching rows.

final=func_df.merge(test_df,on='x')
print(final)
'''
      x    y1    y2       y3        y4         y
0 -20.9 -0.12  10.0  0.99437 -8667.619  0.360801
'''

#if you want to see all values use final=func_df.merge(test_df,how='left',on='x')

Now that we have the matching rows and the y value in func_df we can do the calculation.

loop_cols=[*final.columns[1:5]] #['y1', 'y2', 'y3', 'y4']
for i in loop_cols:
    final['distance_{}'.format(i)]=abs(final['y'] - final[i])

print(final)
'''
    x        y1     y2       y3      y4         y           distance_y1 distance_y2 distance_y3 distance_y4
0   -20.9   -0.12   10.0    0.99437 -8667.619   0.360801    0.480801    9.639199    0.633569    8667.979801000001

'''

CodePudding user response:

I agree with Ray Pelletier - unfortunately I do not have enough credit to comment.

If you merge the two data frames on x, then you can create a new dataframe, where you, for each row in the merged dataframe, can calculate the difference between y and y1, y and y2, and so on.

If you read up on the merge function, there is a parameter you can set to "inner". Then the merged dataframe will only contain values of x which are present in both dataframes.

  • Related