Home > Software design >  updated question: Updating a row value based on another row value with certain condition, pandas
updated question: Updating a row value based on another row value with certain condition, pandas

Time:10-21

I am trying to do something like this. I have a dataframe:

list_val = {'Region': [3715, 3715, 3715, 3715, 3715, 3715, 3715, 3715, 3715, 3715, 3718, 3718, 3718], 
             'Category': [1, 1, 1, 1,1, 2, 2 ,2 ,2, 2,1 ,1 ,1],
             'level': ['E', 'E/M', 'M', 'S', 'unknown', 'E', 'E/M', 'M', "S", "unknown", 'M', "E/M", 'unknown'],
             'low': [2, 5, 10, 4, -1, 8, 12, 5, 14, -1, 3, 5, -1],
             'high': [3, 6, 5, 6, -1, 12, 8, 9, 15, -1, 3, 8, -1]}
df = pd.DataFrame(list_val)
df
Region   Category    level   low high
0   3715    1              E    2   3
1   3715    1              E/M  5   6
2   3715    1              M    10  5
3   3715    1              S    4   6

4   3715    1          unknown  -1  -1

5   3715    2              E    8   12
6   3715    2            E/M    12  8
7   3715    2              M    5   9
8   3715    2              S    14  15

9   3715    2          unknown  -1  -1

10  3718    1              M     3  3
11  3718    1            E/M    5   8

12  3718    1        unknown    -1  -1

I am trying to fill 'low' column with level 'unknown' i.e -1 with the 'low' value where the level is 'E' i.e 2 and 'high' value with level 'unknown' i.e -1 with the 'high' value where the level is 'S' i.e 6 and so on for each category and each region. And if for any region and category if level 'E' or 'S' are missing, keep low and high value as -1 only. So what I want is this:

Region  Category    level   low high
0   3715    1              E    2   3
1   3715    1              E/M  5   6
2   3715    1              M    10  5
3   3715    1              S    4   6

4   3715    1          unknown  2   6

5   3715    2              E    8   12
6   3715    2            E/M    12  8
7   3715    2              M    5   9
8   3715    2              S    14  15

9   3715    2          unknown  8   15

10  3718    1              M     3  3
11  3718    1            E/M    5   8
12  3718    1        unknown    -1  -1

I have tried various ways but nothing is giving what I want, the latest I tried is this:

for index in df.index:
    if df.loc[index,'level'] == 'Unknown':
        df.loc[index,'low'] = df['low'].where(df['level'] == 'E')
        df.loc[index, 'high'] = df['high'].where(df['level] == 'S')

but this is producing errors. Could someone please guide, how should I go about doing this? Thanks!

CodePudding user response:

Use loc update with map/replace:

e_val = df.loc[df['level']=='E'].set_index(['Region', 'Category'])['low']

# use `lower` since there is `unknown` and `Unknown`
unknowns = df['level'].str.lower() == 'unknown'
df.loc[unknowns, 'low'] = (df.loc[unknowns, ['Region','Category']]
                             .agg(tuple,axis=1)
                             .map(e_val)
                          )

Output:

    Region  Category    level   low  high
0     3715         1        E   2.0     3
1     3715         1      E/M   5.0     6
2     3715         1        M  10.0     5
3     3715         1        S   4.0     6
4     3715         1  unknown   2.0    -1
5     3715         2        E   8.0    12
6     3715         2      E/M  12.0     8
7     3715         2        M   5.0     9
8     3715         2        S  14.0    15
9     3715         2  unknown   8.0    -1
10    3718         1        M   3.0     3
11    3718         1      E/M   5.0     8
12    3718         1  unknown   NaN    -1
  • Related