I am making an app where the users can upload their time-series CSV data. I want the user to always upload last year's data (in 2022, the time-series should be of 2021; in 2023 the data should be of 2022 and so on) because of which I have to do a check if the data is from last year or not.
Is there a way I can do this check using pandas while reading the csv (I read the csv by doing pd.read_csv(my_file)
)?
Sample of time-series
dates values
0 2021-01-01 01:00:00 371.428
1 2021-01-01 02:00:00 390.194
2 2021-01-01 03:00:00 349.924
3 2021-01-01 04:00:00 342.886
4 2021-01-01 05:00:00 331.157
.
.
.
.
8779 2021-12-31 20:00:00 515.307
8780 2021-12-31 21:00:00 432.811
8781 2021-12-31 22:00:00 421.082
8782 2021-12-31 23:00:00 394.886
8783 2022-01-01 00:00:00 373.773
The last row will always be of current year at 00:00
CodePudding user response:
I think no, need first read values. You can convert values to DataFrame first and then compare years by Series.dt.year
with Timestamp.year
subtracted 1
and for test if all values match use Series.all
:
df = pd.read_csv(my_file, parse_dates=['dates'])
test = df['dates'].dt.year.iloc[:-1].eq(pd.Timestamp('now').year - 1).all()