Home > database >  How can I map a dictionary onto a dataframe using conditions for the keys?
How can I map a dictionary onto a dataframe using conditions for the keys?

Time:11-09

I have been able to successfully map a dictionary to a dataframe column using two categorical variables as keys, but I can't figure out how to do it if one of my target values should satisfy a condition rather than equal a value.

For example, consider the following dataframe:

df = pd.DataFrame({'F1': ['Y', 'N', 'N', 'N'],
                'F2': ['HB', 'CP', '4D', 'CV'],
                'F3': [10000, 5000, 15000, 2000]})

df['F12T'] = df[['F1','F2']].apply(tuple, axis=1)
df['F13T'] = df[['F1','F3']].apply(tuple, axis=1)

You get:

  F1  F2     F3     F12T        F13T
0  Y  HB  10000  (Y, HB)  (Y, 10000)
1  N  CP   5000  (N, CP)   (N, 5000)
2  N  4D  15000  (N, 4D)  (N, 15000)
3  N  CV   2000  (N, CV)   (N, 2000)

Now to map on two categorical variables, easy, using .map():

dict1 = {('Y', 'HB'): 1.1}
df["R1"] = df["F12T"].map(dict1)
print(df)

 F1  F2     F3     F12T        F13T    R1
0  Y  HB  10000  (Y, HB)  (Y, 10000)  1.1
1  N  CP   5000  (N, CP)   (N, 5000)  NaN
2  N  4D  15000  (N, 4D)  (N, 15000)  NaN
3  N  CV   2000  (N, CV)   (N, 2000)  NaN

But now what I'd like to do is make a new column and create that 1.1 value where F1 = N and F3 > 2000 and F3 < 15000 - essentially add a 1.1 to row 2.

The dictionary I'd want to map I guess would look something like:

dict2 = {('N', '[2001, 15000)'): 1.1}

Which I'd like to result in:

  F1  F2     F3     F12T        F13T   R1   R2
0  Y  HB  10000  (Y, HB)  (Y, 10000)  1.1  NaN
1  N  CP   5000  (N, CP)   (N, 5000)  NaN  1.1
2  N  4D  15000  (N, 4D)  (N, 15000)  NaN  NaN
3  N  CV   2000  (N, CV)   (N, 2000)  NaN  NaN

Any ideas would be greatly appreciated, thanks

CodePudding user response:

You might use & (binary AND) for selecting row where numerous condition should be met, however beware its' stickiness, I would do it following way

import pandas as pd
df = pd.DataFrame({'F1': ['Y', 'N', 'N', 'N'],
                'F2': ['HB', 'CP', '4D', 'CV'],
                'F3': [10000, 5000, 15000, 2000]})
df.loc[(df["F1"]=="N") & (df["F3"]>2000) & (df["F3"]<15000),"R"] = 1.1
print(df)

gives output

  F1  F2     F3    R
0  Y  HB  10000  NaN
1  N  CP   5000  1.1
2  N  4D  15000  NaN
3  N  CV   2000  NaN

Observe that brackets are mandatory.

CodePudding user response:

df['R2'] = np.where( (df['F1']=='N') & (df['F3']> 2000) & (df['F3'] < 15000), 1.1, np.nan  )

Output:

F1  F2  F3  F12T    F13T    R1  R2
0   Y   HB  10000   (Y, HB) (Y, 10000)  1.1 NaN
1   N   CP  5000    (N, CP) (N, 5000)   NaN 1.1
2   N   4D  15000   (N, 4D) (N, 15000)  NaN NaN
3   N   CV  2000    (N, CV) (N, 2000)   NaN NaN
  • Related