I have a func_df
with 4 functions:
x y1 y2 y3 y4
0 -20.0 -0.839071 10.0 0.816164 -8795.000
1 -19.9 -0.865213 9.9 0.994372 -8667.619
2 -19.8 -0.889191 9.8 1.162644 -8541.472
3 -19.7 -0.910947 9.7 1.319299 -8416.553
4 -19.6 -0.930426 9.6 1.462772 -8292.856
.. ... ... ... ... ...
395 19.5 -0.947580 9.5 1.591630 6659.375
396 19.6 -0.930426 9.6 1.462772 6766.216
397 19.7 -0.910947 9.7 1.319299 6874.193
398 19.8 -0.889191 9.8 1.162644 6983.312
399 19.9 -0.865213 9.9 0.994372 7093.579
And a test_df
with scatter points:
x y
0 -6.2 0.360801
1 6.4 -3.655422
2 -17.6 -6065.659700
3 -1.5 -3.247304
4 -17.7 -0.785430
.. ... ...
95 1.6 3.722551
96 16.3 -1.067487
97 -13.3 1.857445
98 -3.8 -0.008831
99 -13.2 1.294064
I want to find the deviation(distance) between all the scatter points and the 4 functions when the x-value is the same on both data frames.
There are some scatter points with same x-value and different y-value.
Edit: A quick example:
Starting with column y1
from func_df
:
1st value is x = -20.0 , y1 = -0.839071.
I want the program to search if there a row in which x = -20.0 in test_df
and if so, then find the difference between the y-value of that row and the y-value of func_df
, which is -0.839071.
Imagine that in test_df
there is a row with x = -20, y = -1. Then what I want is abs(-1 - 0.839071)
.
I used abs()
because the distance has to be a positive value
This was for the row 0 of column y1. I need it for all rows and also for y2, y3 and y4 of func_df
.
I tried something like this:
if test_df.x.equals(func.x):
result_df = func.iloc[:, 1:5].apply(lambda cell: cell - test_df.y[cell.index])
But honestly was a shot in the dark, no idea what I'm doing.
CodePudding user response:
If the values of 'x' in test_df are unique you could merge the two dataframes on 'x'
merged_df = pandas.merge(test_df, func_df, on='x')
abs_delta_y1 = (merged_df['y'] - merged_df['y1']).abs()
etc...
CodePudding user response:
create sample data:
func_df=pd.DataFrame(data={'x':[-20.9,-20.8,-20.7,-20.6],'y1':[-0.12,-0.021,-0.04,-0.91],
'y2':[10.0,9.9,9.8,9.7],'y3':[0.99437,1.162644,1.319299,1.462772],
'y4':[-8667.619,-8541.472,-8416.553,-8292.856]})
print(func_df)
'''
x y1 y2 y3 y4
0 -20.9 -0.120 10.0 0.994370 -8667.619
1 -20.8 -0.021 9.9 1.162644 -8541.472
2 -20.7 -0.040 9.8 1.319299 -8416.553
3 -20.6 -0.910 9.7 1.462772 -8292.856
'''
test_df=pd.DataFrame(data={'x':[-20.9,-15.2],'y':[0.360801,-3.655422]})
print(test_df)
'''
x y
0 -20.9 0.360801
1 -15.2 -3.655422
'''
as you can see the first rows match in both dfs. Now let's combine these two df's. Use this to get only matching rows.
final=func_df.merge(test_df,on='x')
print(final)
'''
x y1 y2 y3 y4 y
0 -20.9 -0.12 10.0 0.99437 -8667.619 0.360801
'''
#if you want to see all values use final=func_df.merge(test_df,how='left',on='x')
Now that we have the matching rows and the y value in func_df we can do the calculation.
loop_cols=[*final.columns[1:5]] #['y1', 'y2', 'y3', 'y4']
for i in loop_cols:
final['distance_{}'.format(i)]=abs(final['y'] - final[i])
print(final)
'''
x y1 y2 y3 y4 y distance_y1 distance_y2 distance_y3 distance_y4
0 -20.9 -0.12 10.0 0.99437 -8667.619 0.360801 0.480801 9.639199 0.633569 8667.979801000001
'''
CodePudding user response:
I agree with Ray Pelletier - unfortunately I do not have enough credit to comment.
If you merge the two data frames on x, then you can create a new dataframe, where you, for each row in the merged dataframe, can calculate the difference between y and y1, y and y2, and so on.
If you read up on the merge function, there is a parameter you can set to "inner". Then the merged dataframe will only contain values of x which are present in both dataframes.