I developed a piece of code that can count the number value increases for a particular column (in this example, that column is Anxiety).
Counting code:
len([b-a for a,b in zip(df['Anxiety'],df['Anxiety'][1:]) if b>a])
Setup code:
df = pd.DataFrame({'Account':[123,123,123,123,123,123,123,123,123,123,456,456,456,456],
'Anxiety':[0,1,np.nan,2,3,0,2,np.nan,np.nan,0,0,1,np.nan,3]})
df
However, two problems here. One is it doesn't account for different accounts, and it won't count properly if there is a null value in between values.
Expected output would be 4 for account 123, and 2 for account 456.
CodePudding user response:
here is one way to do it
#create a temp column 'diff' by taking a difference from previous row (excluding NaN), where difference is positive
# using groupby to sum the positive differences from previous rows
df.assign(
diff=(df[df['Anxiety'].notna()]['Anxiety'].diff()>0).astype(int)
).groupby('Account')['diff'].sum()
Account
123 4.0
456 2.0
Name: diff, dtype: float64
CodePudding user response:
Like:
out = df[df['Anxiety'].notna()].groupby('Account')['Anxiety'].apply(
lambda x: x[x > x.shift()].size)
print(out):
Account
123 4
456 2
CodePudding user response:
Try:
def n_incr(g):
return (g.ffill().diff() > 0).sum()
>>> df.groupby('Account').agg(n_incr)
Anxiety
Account
123 4
456 2