How to apply a condition to Pandas dataframe rows, but only apply the condition to rows of the same-CodePudding

I have a dataframe that's indexed by datetime and has one column of integers and another column that I want to put in a string if a condition of the integers is met. I need the condition to assess the integer in row X against the integer in row X-1, but only if both rows are on the same day.

I am currently using the condition:

df.loc[(df['IntCol'] > df['IntCol'].shift(periods=1)), 'StringCol'] = 'Success'

This successfully applies my condition, however if the shifted row is on a different day then the condition will still use it and I want it to ignore any rows that are on a different day. I've tried various iterations of groupby(df.index.date) but can't seem to figure out if that will work or not.

CodePudding user response：

I think this is what you want. You were probably closer to the answer than you thought...

There is two dataframes use to show that the logic you have works whether or not data is random or integers are sorted range.

You will need to import random to see the data

dates = list(pd.date_range(start='2021/1/1', periods=16, freq='4H'))

def compare(x):
    x.loc[(x['IntCol'] > x['IntCol'].shift(periods=1)), 'StringCol'] = 'Success'
    return x

#### Will show success in all rows except where dates change because it's a range in numerical order
df = pd.DataFrame({'IntCol': range(10,26)}, index=dates)
df.groupby(df.index.date).apply(compare)

2021-01-01 00:00:00      10       NaN
2021-01-01 04:00:00      11   Success
2021-01-01 08:00:00      12   Success
2021-01-01 12:00:00      13   Success
2021-01-01 16:00:00      14   Success
2021-01-01 20:00:00      15   Success
2021-01-02 00:00:00      16       NaN
2021-01-02 04:00:00      17   Success
2021-01-02 08:00:00      18   Success
2021-01-02 12:00:00      19   Success
2021-01-02 16:00:00      20   Success
2021-01-02 20:00:00      21   Success
2021-01-03 00:00:00      22       NaN
2021-01-03 04:00:00      23   Success
2021-01-03 08:00:00      24   Success
2021-01-03 12:00:00      25   Success

### random numbers to show that it works here too
df = pd.DataFrame({'IntCol':  [random.randint(3, 500) for x in range(0,16)]}, index=dates)
df.groupby(df.index.date).apply(compare)

                     IntCol StringCol
2021-01-01 00:00:00     386       NaN
2021-01-01 04:00:00     276       NaN
2021-01-01 08:00:00     143       NaN
2021-01-01 12:00:00     144   Success
2021-01-01 16:00:00      10       NaN
2021-01-01 20:00:00     343   Success
2021-01-02 00:00:00     424       NaN
2021-01-02 04:00:00     362       NaN
2021-01-02 08:00:00     269       NaN
2021-01-02 12:00:00      35       NaN
2021-01-02 16:00:00     278   Success
2021-01-02 20:00:00     268       NaN
2021-01-03 00:00:00      58       NaN
2021-01-03 04:00:00     169   Success
2021-01-03 08:00:00      85       NaN
2021-01-03 12:00:00     491   Success

CodePudding user response：

Not sure if this is the best way to do it but gets you the answer:

df['out'] = np.where(df['int_col'] > df.groupby(df.index)['int_col'].shift(1), 'Success', 'Failure')