I am trying to do something like this. I have a dataframe:
list_val = {'Region': [3715, 3715, 3715, 3715, 3715, 3715, 3715, 3715, 3715, 3715, 3718, 3718, 3718],
'Category': [1, 1, 1, 1,1, 2, 2 ,2 ,2, 2,1 ,1 ,1],
'level': ['E', 'E/M', 'M', 'S', 'unknown', 'E', 'E/M', 'M', "S", "unknown", 'M', "E/M", 'unknown'],
'low': [2, 5, 10, 4, -1, 8, 12, 5, 14, -1, 3, 5, -1],
'high': [3, 6, 5, 6, -1, 12, 8, 9, 15, -1, 3, 8, -1]}
df = pd.DataFrame(list_val)
df
Region Category level low high
0 3715 1 E 2 3
1 3715 1 E/M 5 6
2 3715 1 M 10 5
3 3715 1 S 4 6
4 3715 1 unknown -1 -1
5 3715 2 E 8 12
6 3715 2 E/M 12 8
7 3715 2 M 5 9
8 3715 2 S 14 15
9 3715 2 unknown -1 -1
10 3718 1 M 3 3
11 3718 1 E/M 5 8
12 3718 1 unknown -1 -1
I am trying to fill 'low' column with level 'unknown' i.e -1 with the 'low' value where the level is 'E' i.e 2 and 'high' value with level 'unknown' i.e -1 with the 'high' value where the level is 'S' i.e 6 and so on for each category and each region. And if for any region and category if level 'E' or 'S' are missing, keep low and high value as -1 only. So what I want is this:
Region Category level low high
0 3715 1 E 2 3
1 3715 1 E/M 5 6
2 3715 1 M 10 5
3 3715 1 S 4 6
4 3715 1 unknown 2 6
5 3715 2 E 8 12
6 3715 2 E/M 12 8
7 3715 2 M 5 9
8 3715 2 S 14 15
9 3715 2 unknown 8 15
10 3718 1 M 3 3
11 3718 1 E/M 5 8
12 3718 1 unknown -1 -1
I have tried various ways but nothing is giving what I want, the latest I tried is this:
for index in df.index:
if df.loc[index,'level'] == 'Unknown':
df.loc[index,'low'] = df['low'].where(df['level'] == 'E')
df.loc[index, 'high'] = df['high'].where(df['level] == 'S')
but this is producing errors. Could someone please guide, how should I go about doing this? Thanks!
CodePudding user response:
Use loc
update with map/replace
:
e_val = df.loc[df['level']=='E'].set_index(['Region', 'Category'])['low']
# use `lower` since there is `unknown` and `Unknown`
unknowns = df['level'].str.lower() == 'unknown'
df.loc[unknowns, 'low'] = (df.loc[unknowns, ['Region','Category']]
.agg(tuple,axis=1)
.map(e_val)
)
Output:
Region Category level low high
0 3715 1 E 2.0 3
1 3715 1 E/M 5.0 6
2 3715 1 M 10.0 5
3 3715 1 S 4.0 6
4 3715 1 unknown 2.0 -1
5 3715 2 E 8.0 12
6 3715 2 E/M 12.0 8
7 3715 2 M 5.0 9
8 3715 2 S 14.0 15
9 3715 2 unknown 8.0 -1
10 3718 1 M 3.0 3
11 3718 1 E/M 5.0 8
12 3718 1 unknown NaN -1