Home > Net >  Is there a way to compare and replace data between 2 dataframe in pandas?
Is there a way to compare and replace data between 2 dataframe in pandas?

Time:11-20

I would like to clean data based on the range from different data frame.

Range data looks like

labs_range= pd.DataFrame({"Lab":['Albumin','Sodium','Potassium'],'low': [2,2,3],
                         'high':[4,5,4]})

    Lab low high
0   Albumin 2   4
1   Sodium  2   5
2   Potassium   3   4

and data I would like to clean looks like :

labs = pd.DataFrame({"Albumin":[1,1,2,1,2,3,4,5],'Calcium':[1,1,2,1,2,3,4,5],'Sodium':[1,1,2,1,2,3,4,5]})



Albumin Calcium Sodium
0   1   1   1
1   1   1   1
2   2   2   2
3   1   1   1
4   2   2   2
5   3   3   3
6   4   4   4
7   5   5   5

I can do it by using a loop but it would take too long if we have data that is big.

for i in labs_range['Lab']:
    try:
        labs[i] = np.where((labs[i]<labs_range.loc[labs_range['Lab']==str(i),'low'].values[0]) | (labs[i]>labs_range.loc[labs_range['Lab']==str(i),'high'].values[0]),np.nan,labs[i])
    except:
        print(i)

The result that we get from this loops is correct but would like it to be more efficient.

    Albumin Calcium Sodium
0   nan 1   nan
1   nan 1   nan
2   2.00000 2   2.00000
3   nan 1   nan
4   2.00000 2   2.00000
5   3.00000 3   3.00000
6   4.00000 4   4.00000
7   nan 5   5.00000

CodePudding user response:

You can use pd.isin

labs = pd.DataFrame({"Albumin":[1,1,2,1,2,3,4,5],'Calcium':[1,1,2,1,2,3,4,5],'Sodium':[1,1,2,1,2,3,4,5]})
   Albumin  Calcium  Sodium
0        1        1       1
1        1        1       1
2        2        2       2
3        1        1       1
4        2        2       2
5        3        3       3
6        4        4       4
7        5        5       5

cond = labs.isin({'Albumin': range(2, 5),
                  'Calcium': range(2, 6),
                  'Sodimu': range(3, 5)})

labs.where(cond, np.nan)


   Albumin  Calcium  Sodium
0      NaN      NaN     NaN
1      NaN      NaN     NaN
2      2.0      2.0     NaN
3      NaN      NaN     NaN
4      2.0      2.0     NaN
5      3.0      3.0     3.0
6      4.0      4.0     4.0
7      NaN      5.0     NaN

CodePudding user response:

Since you have column dose not intersection we need find the overlap columns 1st

colmatch = labs.columns.intersection(labs_range.Lab)
labs_range = labs_range.set_index('Lab').loc[colmatch]
labs[colmatch] = labs.where(labs.sub(labs_range.low).ge(0) & labs.sub(labs_range.high).le(0))[colmatch]
labs
Out[246]: 
   Albumin  Calcium  Sodium
0      NaN        1     NaN
1      NaN        1     NaN
2      2.0        2     2.0
3      NaN        1     NaN
4      2.0        2     2.0
5      3.0        3     3.0
6      4.0        4     4.0
7      NaN        5     5.0
  • Related