Given a dataset as follows and a date range from 2013-05-01
to 2013-05-15
:
import pandas as pd
import numpy as np
np.random.seed(2021)
dates = pd.date_range('20130226', periods=90)
df = pd.DataFrame(np.random.uniform(0, 10, size=(90, 6)), index=dates, columns=['A_values', 'B_values', 'C_values', 'D_values', 'E_values', 'target'])
Out:
A_values B_values C_values D_values E_values target
2013-02-26 6.059783 7.333694 1.389472 3.126731 9.972433 1.281624
2013-02-27 1.789931 7.529254 6.621605 7.843101 0.968944 0.585713
2013-02-28 9.623960 6.165574 0.866300 5.612724 6.165247 9.638430
2013-03-01 5.743043 3.711608 4.521452 2.018502 5.693051 1.950960
2013-03-02 5.837040 4.763135 5.178144 8.230986 7.322250 0.690563
... ... ... ... ... ...
2013-05-22 8.795736 6.316478 0.427136 3.864723 6.803052 5.554824
2013-05-23 7.959282 1.839659 2.225667 5.140268 2.756443 8.440801
2013-05-24 5.412016 9.081583 7.212742 5.890717 0.430184 0.968086
2013-05-25 1.088414 9.016004 5.384490 1.610210 5.461017 0.672555
2013-05-26 4.930571 2.338785 9.823048 6.893207 6.312104 5.273122
First I filter columns by df.filter(regex='_values$')
, then I hope to return a list of column names whose minimum value falls in the given date range (2013-05-01, 2013-05-15)
, ie. If column A_values
's minimum value is in any day in this range, then A_values
will be contained in the returned result list.
How could I achieve that in Pandas or Numpy? Thanks.
CodePudding user response:
Use DataFrame.idxmin
for minimal datetimes per columns and then filter index by Series.between_time
s = df.filter(regex='_values$').idxmin()
out = s[s.between('2013-05-01','2013-05-15')].index.tolist()
print (out)
['D_values']