Home > Blockchain >  New column based values based on values of multiple columns
New column based values based on values of multiple columns

Time:08-12

I have a dataframe df which looks like this:

number category cat_1 cat_2 cat_3
20 category1 5 10 20
30 category2 6 12 24
25 category3 4 9 15
10 category2 4 12 16

I want 2 extra columns called df['number_upper'] and df['number_lower].

df['number_upper'] should ADD cat_1 values to df['number'] when category == category1 and should add cat_2 values to df['number'] if category == category2 and add cat_3 values to df['number'] when category == category3

The same way,

df['number_lower'] should SUBTRACT cat_1 values to df['number'] when category == category1 and should subtract cat_2 values to df['number'] if category == category2 and subtract cat_3 values to df['number'] when category == category3.

The resulting dataframe should look like this:

number category cat_1 cat_2 cat_3 number_upper number_lower
20 category1 5 10 20 25 15
30 category2 6 12 24 42 18
25 category3 4 9 15 40 10
10 category2 4 12 16 22 -2

CodePudding user response:

Use lookup with rename columns names for match values in column and add or subtract column:

d = {'cat_1':'category1','cat_2':'category2','cat_3':'category3'}
idx, cols = pd.factorize(df['category'])
a = df.rename(columns=d).reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx]
print (a)
[5, 12, 15, 12]

df['number_upper'] = df['number'].add(a)
df['number_lower'] = df['number'].sub(a)
print (df)
   number   category  cat_1  cat_2  cat_3  number_upper  number_lower
0      20  category1      5     10     20            25            15
1      30  category2      6     12     24            42            18
2      25  category3      4      9     15            40            10
3      10  category2      4     12     16            22            -2

If some values not match are created as missing values (NaN):

print (df)
   number   category  cat_1  cat_2  cat_3
0      20  category1      5     10     20
1      30  category2      6     12     24
2      25  category3      4      9     15
3      10  category4      4     12     16 <- change data

d = {'cat_1':'category1','cat_2':'category2','cat_3':'category3'}
idx, cols = pd.factorize(df['category'])
a = df.rename(columns=d).reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx]
print (a)
[5, 12, 15, NaN]

df['number_upper']  = df['number'].add(a)
df['number_lower']  = df['number'].sub(a)
print (df)
   number   category  cat_1  cat_2  cat_3  number_upper  number_lower
0      20  category1      5     10     20          25.0          15.0
1      30  category2      6     12     24          42.0          18.0
2      25  category3      4      9     15          40.0          10.0
3      10  category4      4     12     16           NaN           NaN
  • Related