If I have a dataframe
date
01.01.2003
02.01.2003
03.01.2003
05.01.2003
06.01.2003
And I apply this code
for i in (df['date']):
if df['date'].iloc[i 1]-df['date'].iloc[i] == 1 :
df['Max'] = df['date'].iloc[i 1]
else :
df['Max'] = ''
it will turn an error
Addition/subtraction of integers and integer-arrays with Timestamp is no longer supported. Instead of adding/subtracting `n`, use `n * obj.freq`
but if I change i
to timdelta(days=1) it will turn error says that cannnot indexing if not integer.
Then how the code supposed to like ? I want to define 'max' on every consecutive days.
this is the output I like
date max
01.01.2003
02.01.2003
03.01.2003 03.01.2003
05.01.2003
06.01.2003 06.01.2003
*Note that it only write the max one in consecutive, and leave other empty. from 03.01.2003 to 05.01.2003 is not consecutive so start new
CodePudding user response:
Convert column to datetimes and then get maximal value to new column by consecutive datetimes by Series.diff
and compare by 1 day
with GroupBy.transform
and max
:
df['date'] = pd.to_datetime(df['date'], dayfirst=True)
df['Max'] = df.groupby(df['date'].diff().dt.days.ne(1).cumsum())['date'].transform('max')
#thank you Corralien for alternative
df['Max'] = df.groupby(df['date'].diff().ne('1D').cumsum())['date'].transform('max')
print (df)
date Max
0 2003-01-01 2003-01-03
1 2003-01-02 2003-01-03
2 2003-01-03 2003-01-03
If need remove consecutive maximal datetimes use:
df['date'] = pd.to_datetime(df['date'], dayfirst=True)
s = df['date'].diff().dt.days.ne(1).cumsum()
df['Max'] = df.groupby(s)['date'].transform('max').mask(s.duplicated(keep='last'))
#thank you Corralien for alternative
s = df['date'].diff().ne('1D').cumsum()
df['Max'] = df.groupby(s)['date'].transform('max').mask(s.duplicated(keep='last'))
print (df)
date Max
0 2003-01-01 NaT
1 2003-01-02 NaT
2 2003-01-03 2003-01-03
3 2003-01-05 NaT
4 2003-01-06 2003-01-06