Given a dataframe df
as follows:
df = pd.DataFrame([{'date': '2021-10-1',
'pct1': -0.039473959,
'pct2': -0.039473959,
'pct3': -0.032095057,
'pct4': -0.106310578,
'pct5': -0.039473959},
{'date': '2021-10-2',
'pct1': 0.222111128,
'pct2': 0.042484279,
'pct3': 0.108269001,
'pct4': -0.050188884,
'pct5': 0.042484279},
{'date': '2021-10-3',
'pct1': -0.0131719,
'pct2': 0.051357438,
'pct3': 0.051357438,
'pct4': 0.130772264,
'pct5': 0.051357438},
{'date': '2021-10-4',
'pct1': 0.092982799,
'pct2': 0.092982799,
'pct3': 0.092982799,
'pct4': 0.147102302,
'pct5': 0.092982799}])
Out:
date pct1 pct2 pct3 pct4 pct5
0 2021-10-1 -0.039474 -0.039474 -0.032095 -0.106311 -0.039474
1 2021-10-2 0.222111 0.042484 0.108269 -0.050189 0.042484
2 2021-10-3 -0.013172 0.051357 0.051357 0.130772 0.051357
3 2021-10-4 0.092983 0.092983 0.092983 0.147102 0.092983
I hope to row-wisely count the number of values based on the following logic:
- if value is in the interval of
-0.02 ≤ x ≤ 0.02
, then count tokeep_same
column; - if value is in the interval of
x < -0.02
, then count todecrease
column; - if value is in the interval of
x > 0.02
, then count toincrease
column;
The expected result will like:
date decrease keep_same increase
0 2021-10-1 5 0 0
1 2021-10-2 1 0 4
2 2021-10-3 0 1 4
3 2021-10-4 0 0 5
My idea is as follows, at first step replace values of df with 0, -1 and 1 according the conditions above, then count them row-wisely:
pct_df = df.filter(regex='^pct')
conds = [pct_df.values > 0.02, -0.02 <= pct_df.values <= 0.02, pct_df.values < -0.02]
choices = [1, 0, -1]
pd.DataFrame(np.select(conds, choices, default=np.NaN),
index=pct_df.index,
columns=pct_df.columns)
How could I acheive that? Thanks.
CodePudding user response:
For condition is necessary chain it by &
for between values, then is defined new columns names and in loop are asigned counts by sum
, for new DataFrame is filtered only on date
column:
pct_df = df.filter(regex='^pct')
conds = [pct_df.values > 0.02,
(pct_df.values >= -0.02) & (pct_df.values <= 0.02),
pct_df.values < -0.02]
cols = ['decrease', 'keep_same', 'increase']
df1 = df[['date']].copy()
for c, m in zip(cols, conds):
df1[c] = m.sum(axis=1)
print (df1)
date decrease keep_same increase
0 2021-10-1 0 0 5
1 2021-10-2 4 0 1
2 2021-10-3 4 1 0
3 2021-10-4 5 0 0
CodePudding user response:
If you don't have a problem with assigning these one by one you can simply use
df['decrease'] = (pct_df.values < -0.02).sum(axis=1)
df['keep_same'] = (-0.02 <= pct_df.values <= 0.02).sum(axis=1)
df['increase'] = (pct_df.values > 0.02).sum(axis=1)
You already created the boolean values. Summing over axis 1 makes your row wise calculation complete.