Home > Net >  Creating a new dataframe by filtering matches from columns of two existing dataframes with error tol
Creating a new dataframe by filtering matches from columns of two existing dataframes with error tol

Time:01-01

I am pretty new to python and pandas, and I want to sort through the existing two dataframes by certain columns, and create a third dataframe that contains only the value matches within a tolerance. In other words, I have df1 and df2, and I want df3 to contain the rows and columns of df2 that are within the tolerance of values in df1:

Two dataframes:

df1=pd.DataFrame([[0.221,2.233,7.84554,10.222],[0.222,2.000,7.8666,10.000], 
 [0.220,2.230,7.8500,10.005]],columns=('rt','mz','mz2','abundance'))

[Dataframe 1]

1

df2=pd.DataFrame([[0.219,2.233,7.84500,10.221],[0.220,7.8669,10.003],[0.229,2.238,7.8508,10.009]],columns=('rt','mz','mz2','abundance'))

[Dataframe 2]

2

Expected Output:

df3=pd.DataFrame([[0.219,2.233,7.84500,10.221],[0.220,2.002,7.8669,10.003]],columns=('Rt','mz','mz2','abundance'))

[Dataframe 3]

3

I have tried forloops and filters, but as I am a newby nothing is really working for me. But here us what I'm trying now:

import pandas as pd
import numpy as np
p=[]
d=np.array(p)
#print(d.dtype)






def count(df2, l, r):
l=[(df1['Rt']-0.001)]
r=[(df1['Rt'] 0.001)]    
for x in df2['Rt']:
    # condition check
    if x>= l and x<= r:
        print(x)
        d.append(x)

where p and d are the corresponding dataframe and the array (if necessary to make array?) that will be populated. I bet the problem lies somewhere in the fact that that the function shouldn't contain the forloop.

Ideally, this could work to sort like ~13,000 rows of a dataframe using the 180 column values of another dataframe.

Thank you in advance!

CodePudding user response:

Is this what you're looking for?:

min = df1.rt.min()-0.001
max = df1.rt.max() 0.001
df3 = df2[(df2.rt >= min) & (df2.rt <= max)]
>>> df3

enter image description here

  • Related