Home > front end >  Count values row-wisely based on multiple interval conditions in Python
Count values row-wisely based on multiple interval conditions in Python

Time:11-05

Given a dataframe df as follows:

df = pd.DataFrame([{'date': '2021-10-1',
  'pct1': -0.039473959,
  'pct2': -0.039473959,
  'pct3': -0.032095057,
  'pct4': -0.106310578,
  'pct5': -0.039473959},
 {'date': '2021-10-2',
  'pct1': 0.222111128,
  'pct2': 0.042484279,
  'pct3': 0.108269001,
  'pct4': -0.050188884,
  'pct5': 0.042484279},
 {'date': '2021-10-3',
  'pct1': -0.0131719,
  'pct2': 0.051357438,
  'pct3': 0.051357438,
  'pct4': 0.130772264,
  'pct5': 0.051357438},
 {'date': '2021-10-4',
  'pct1': 0.092982799,
  'pct2': 0.092982799,
  'pct3': 0.092982799,
  'pct4': 0.147102302,
  'pct5': 0.092982799}])

Out:

        date      pct1      pct2      pct3      pct4      pct5
0  2021-10-1 -0.039474 -0.039474 -0.032095 -0.106311 -0.039474
1  2021-10-2  0.222111  0.042484  0.108269 -0.050189  0.042484
2  2021-10-3 -0.013172  0.051357  0.051357  0.130772  0.051357
3  2021-10-4  0.092983  0.092983  0.092983  0.147102  0.092983

I hope to row-wisely count the number of values based on the following logic:

  1. if value is in the interval of -0.02 ≤ x ≤ 0.02, then count to keep_same column;
  2. if value is in the interval of x < -0.02, then count to decrease column;
  3. if value is in the interval of x > 0.02, then count to increase column;

The expected result will like:

        date  decrease  keep_same  increase
0  2021-10-1         5          0         0
1  2021-10-2         1          0         4
2  2021-10-3         0          1         4
3  2021-10-4         0          0         5

My idea is as follows, at first step replace values of df with 0, -1 and 1 according the conditions above, then count them row-wisely:

pct_df = df.filter(regex='^pct')

conds = [pct_df.values > 0.02, -0.02 <= pct_df.values <= 0.02, pct_df.values < -0.02]
choices = [1, 0, -1]

pd.DataFrame(np.select(conds, choices, default=np.NaN),
             index=pct_df.index,
             columns=pct_df.columns)

How could I acheive that? Thanks.

CodePudding user response:

For condition is necessary chain it by & for between values, then is defined new columns names and in loop are asigned counts by sum, for new DataFrame is filtered only on date column:

pct_df = df.filter(regex='^pct')

conds = [pct_df.values > 0.02, 
         (pct_df.values >= -0.02) & (pct_df.values <= 0.02), 
         pct_df.values < -0.02]
cols = ['decrease', 'keep_same', 'increase']

df1 = df[['date']].copy()
for c, m in zip(cols, conds):
    df1[c] = m.sum(axis=1)
print (df1)
        date  decrease  keep_same  increase
0  2021-10-1         0          0         5
1  2021-10-2         4          0         1
2  2021-10-3         4          1         0
3  2021-10-4         5          0         0

CodePudding user response:

If you don't have a problem with assigning these one by one you can simply use

df['decrease'] = (pct_df.values < -0.02).sum(axis=1)
df['keep_same'] = (-0.02 <= pct_df.values <= 0.02).sum(axis=1)
df['increase'] = (pct_df.values > 0.02).sum(axis=1)

You already created the boolean values. Summing over axis 1 makes your row wise calculation complete.

  • Related