In pandas, How can I create a new column B
based on a column A
in df
, such that:
B(i)=1
ifA_(i-1)-A_(i) >= 0
whenA_(i) <= 10
B(i)=1
ifA_(i-1)-A_(i) >= 2
when10 < A_(i) <= 20
B(i)=1
ifA_(i-1)-A_(i) >= 5
when20 < A_(i)
B(i)=0
for any other case
However, the first B_i
value is always two
Example:
A | B |
---|---|
5 | 2 (the first B_i) |
12 | 0 |
14 | 0 |
22 | 0 |
20 | 0 |
33 | 0 |
11 | 1 |
8 | 1 |
15 | 0 |
11 | 1 |
CodePudding user response:
The most intuitive way is to iterate trough the lines testing all the three conditions in a single-line if-else
(as B(i) is 1 for all the true conditions).
import pandas as pd
df = pd.DataFrame({'A':[5,12,14,22,20,33,11,8,15,11]})
B = [2]
for i in range(1,len(df['A'])):
newvalue = 1 if (df['A'][i-1]-df['A'][i]>=0 and df['A'][i]<=10) or (df['A'][i-1]-df['A'][i]>=2 and df['A'][i]>10 and df['A'][i]<=20) or (df['A'][i-1]-df['A'][i]>=5 and df['A'][i]>20) else 0
B.append(newvalue)
df['B'] = B
print(df)
Output:
A B
0 5 2
1 12 0
2 14 0
3 22 0
4 20 1
5 33 0
6 11 1
7 8 1
8 15 0
9 11 1
CodePudding user response:
You can use Pandas.shift
for creating A_(i-1)
and use Numpy.select
for checking multiple conditions like below:
import pandas as pd
import numpy as np
df = pd.DataFrame({'A':[5,12,14,22,20,33,11,8,15,11]})
df['A_prv'] = df['A'].shift(1)
conditions = [
(df.index==0),
((df['A_prv'] - df['A'] >= 0) & (df['A'].le(10))),
((df['A_prv'] - df['A'] >= 2) & (df['A'].between(10, 20, inclusive='right'))),
# ^^^ 10 < df['A'] <= 20 ^^^
((df['A_prv'] - df['A'] >= 5) & (df['A'].ge(20)))
]
choices = [2, 1, 1, 1]
df['B'] = np.select(conditions, choices, default=0)
print(df)
Output:
A A_prv B
0 5 NaN 2
1 12 5.0 0
2 14 12.0 0
3 22 14.0 0
4 20 22.0 1
5 33 20.0 0
6 11 33.0 1
7 8 11.0 1
8 15 8.0 0
9 11 15.0 1