I have a dataframe df which looks like this:
number | category | cat_1 | cat_2 | cat_3 |
---|---|---|---|---|
20 | category1 | 5 | 10 | 20 |
30 | category2 | 6 | 12 | 24 |
25 | category3 | 4 | 9 | 15 |
10 | category2 | 4 | 12 | 16 |
I want 2 extra columns called df['number_upper']
and df['number_lower]
.
df['number_upper']
should ADD cat_1
values to df['number']
when category == category1
and should add cat_2 values
to df['number']
if category == category2
and add cat_3
values to df['number']
when category == category3
The same way,
df['number_lower']
should SUBTRACT cat_1
values to df['number']
when category == category1
and should subtract cat_2
values to df['number']
if category == category2
and subtract cat_3
values to df['number']
when category == category3
.
The resulting dataframe should look like this:
number | category | cat_1 | cat_2 | cat_3 | number_upper | number_lower |
---|---|---|---|---|---|---|
20 | category1 | 5 | 10 | 20 | 25 | 15 |
30 | category2 | 6 | 12 | 24 | 42 | 18 |
25 | category3 | 4 | 9 | 15 | 40 | 10 |
10 | category2 | 4 | 12 | 16 | 22 | -2 |
CodePudding user response:
Use lookup with rename columns names for match values in column and add or subtract column:
d = {'cat_1':'category1','cat_2':'category2','cat_3':'category3'}
idx, cols = pd.factorize(df['category'])
a = df.rename(columns=d).reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx]
print (a)
[5, 12, 15, 12]
df['number_upper'] = df['number'].add(a)
df['number_lower'] = df['number'].sub(a)
print (df)
number category cat_1 cat_2 cat_3 number_upper number_lower
0 20 category1 5 10 20 25 15
1 30 category2 6 12 24 42 18
2 25 category3 4 9 15 40 10
3 10 category2 4 12 16 22 -2
If some values not match are created as missing values (NaN):
print (df)
number category cat_1 cat_2 cat_3
0 20 category1 5 10 20
1 30 category2 6 12 24
2 25 category3 4 9 15
3 10 category4 4 12 16 <- change data
d = {'cat_1':'category1','cat_2':'category2','cat_3':'category3'}
idx, cols = pd.factorize(df['category'])
a = df.rename(columns=d).reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx]
print (a)
[5, 12, 15, NaN]
df['number_upper'] = df['number'].add(a)
df['number_lower'] = df['number'].sub(a)
print (df)
number category cat_1 cat_2 cat_3 number_upper number_lower
0 20 category1 5 10 20 25.0 15.0
1 30 category2 6 12 24 42.0 18.0
2 25 category3 4 9 15 40.0 10.0
3 10 category4 4 12 16 NaN NaN