I have been able to successfully map a dictionary to a dataframe column using two categorical variables as keys, but I can't figure out how to do it if one of my target values should satisfy a condition rather than equal a value.
For example, consider the following dataframe:
df = pd.DataFrame({'F1': ['Y', 'N', 'N', 'N'],
'F2': ['HB', 'CP', '4D', 'CV'],
'F3': [10000, 5000, 15000, 2000]})
df['F12T'] = df[['F1','F2']].apply(tuple, axis=1)
df['F13T'] = df[['F1','F3']].apply(tuple, axis=1)
You get:
F1 F2 F3 F12T F13T
0 Y HB 10000 (Y, HB) (Y, 10000)
1 N CP 5000 (N, CP) (N, 5000)
2 N 4D 15000 (N, 4D) (N, 15000)
3 N CV 2000 (N, CV) (N, 2000)
Now to map on two categorical variables, easy, using .map()
:
dict1 = {('Y', 'HB'): 1.1}
df["R1"] = df["F12T"].map(dict1)
print(df)
F1 F2 F3 F12T F13T R1
0 Y HB 10000 (Y, HB) (Y, 10000) 1.1
1 N CP 5000 (N, CP) (N, 5000) NaN
2 N 4D 15000 (N, 4D) (N, 15000) NaN
3 N CV 2000 (N, CV) (N, 2000) NaN
But now what I'd like to do is make a new column and create that 1.1 value where F1 = N and F3 > 2000 and F3 < 15000 - essentially add a 1.1 to row 2.
The dictionary I'd want to map I guess would look something like:
dict2 = {('N', '[2001, 15000)'): 1.1}
Which I'd like to result in:
F1 F2 F3 F12T F13T R1 R2
0 Y HB 10000 (Y, HB) (Y, 10000) 1.1 NaN
1 N CP 5000 (N, CP) (N, 5000) NaN 1.1
2 N 4D 15000 (N, 4D) (N, 15000) NaN NaN
3 N CV 2000 (N, CV) (N, 2000) NaN NaN
Any ideas would be greatly appreciated, thanks
CodePudding user response:
You might use &
(binary AND) for selecting row where numerous condition should be met, however beware its' stickiness, I would do it following way
import pandas as pd
df = pd.DataFrame({'F1': ['Y', 'N', 'N', 'N'],
'F2': ['HB', 'CP', '4D', 'CV'],
'F3': [10000, 5000, 15000, 2000]})
df.loc[(df["F1"]=="N") & (df["F3"]>2000) & (df["F3"]<15000),"R"] = 1.1
print(df)
gives output
F1 F2 F3 R
0 Y HB 10000 NaN
1 N CP 5000 1.1
2 N 4D 15000 NaN
3 N CV 2000 NaN
Observe that brackets are mandatory.
CodePudding user response:
df['R2'] = np.where( (df['F1']=='N') & (df['F3']> 2000) & (df['F3'] < 15000), 1.1, np.nan )
Output:
F1 F2 F3 F12T F13T R1 R2
0 Y HB 10000 (Y, HB) (Y, 10000) 1.1 NaN
1 N CP 5000 (N, CP) (N, 5000) NaN 1.1
2 N 4D 15000 (N, 4D) (N, 15000) NaN NaN
3 N CV 2000 (N, CV) (N, 2000) NaN NaN