I'm trying to compare two dataframes:
df1:
entry mass Precursor mass_pos
0 KGTLPK 128 642.780 770.780
1 KGTLPK 48 642.780 690.780
2 KGTLPK 112 642.780 754.780
3 KGTLPK 32 642.780 674.780
4 KGTLPK 156 642.780 798.780
df2:
Mass
0 586.672
1 798.780
2 690.780
3 400.000
My goal is to find any matches of df2 'mass' with df1 'mass_pos'.
I really liked this:
df1['masses match'] = np.where(df2['Mass'] == df1['mass_pos'], 'True', 'False')
But this throws a value error:
ValueError: Can only compare identically-labeled Series objects
I think that this is because these dataframes have a different number of rows. Is there a way to overcome this?
CodePudding user response:
Use np.isclose
:
tolerance = 1e-03
match_mass = lambda x: np.any(np.isclose(x, df2['Mass'], atol=tolerance))
df1['masses match'] = df1['mass_pos'].apply(match_mass)
print(df1)
# Output:
entry mass Precursor mass_pos masses match
0 KGTLPK 128 642.78 770.78 False
1 KGTLPK 48 642.78 690.78 True
2 KGTLPK 112 642.78 754.78 False
3 KGTLPK 32 642.78 674.78 False
4 KGTLPK 156 642.78 798.78 True
CodePudding user response:
Use merge to check exist item.
Example:
df=df1.merge(df2,left_on="mass_pos",right_on="Mass")
Code:
import pandas as pd
df1 = pd.DataFrame({'entry': ['KGTLPK','KGTLPK','KGTLPK','KGTLPK','KGTLPK' ],
'mass': [128 ,48, 112 , 32, 156],
'Precursor': [642.780,642.780,642.780,642.780,642.780],
'mass_pos': [ 770.780, 690.780, 754.780, 674.780, 798.780]
})
df2 = pd.DataFrame({'Mass': [586.672, 798.780, 690.780, 400.000 ]})
df=df1.merge(df2,left_on="mass_pos",right_on="Mass")
print(df)
Output :
entry mass Precursor mass_pos Mass
0 KGTLPK 48 642.78 690.78 690.78
1 KGTLPK 156 642.78 798.78 798.78
To get the expected result, i did like this :
import pandas as pd
df1 = pd.DataFrame({'entry': ['KGTLPK','KGTLPK','KGTLPK','KGTLPK','KGTLPK' ],
'mass': [128 ,48, 112 , 32, 156],
'Precursor': [642.780,642.780,642.780,642.780,642.780],
'mass_pos': [ 770.780, 690.780, 754.780, 674.780, 798.780]
})
df2 = pd.DataFrame({'Mass': [586.672, 798.780, 690.780, 400.000 ]})
DF3 = df1[['mass_pos']].fillna('None')
DF4 = df2[['Mass']].fillna('None')
df1['Merge_Result'] = 'F'
for i in range(DF3.shape[0]):
for k in range(DF4.shape[0]):
if list(DF3.iloc[i]) == list(DF4.iloc[k]):
df1['Merge_Result'][i] = 'T'
print(df1)
Output :
entry mass Precursor mass_pos Merge_Result
0 KGTLPK 128 642.78 770.78 F
1 KGTLPK 48 642.78 690.78 T
2 KGTLPK 112 642.78 754.78 F
3 KGTLPK 32 642.78 674.78 F
4 KGTLPK 156 642.78 798.78 T