I am working with (15 minute) data looking like the following:
records
year1-01-01 00:00 1
year1-01-01 00:15 2
...
year2-01-01 00:00 3
year2-01-01 00:15 4
...
year3-01-01 00:00 5
year3-01-01 00:15 6
...
...
And, if possible, I would like to split this data into multiple columns by year for analysis:
year1 year2 year3 ...
01-01 00:00 1 3 5
01-01 00:15 2 4 6
... ...
The data is not guaranteed to contain all 15 minute timestamps for every year, so if one of the years is missing a timestamp, I would like that timestamp to be omitted from the result for all years.
I have tried various combinations of df.groupby(df.index.year)
and pd.merge
or pd.concat
to perform an inner join on the groups, but I have not gotten it to work properly.
What would be a clean way to implement this?
Many thanks in advance.
============================
Code to generate example dataframe:
records = {"records": [1, 2, 3, 4, 5, 6]}
dates = [
"2020-01-01 00:00:00",
"2020-01-01 00:15:00",
# ...
"2021-01-01 00:00:00",
"2021-01-01 00:15:00",
# ...
"2022-01-01 00:00:00",
"2022-01-01 00:15:00",
# ...
]
df = pd.DataFrame(data=records, index=pd.DatetimeIndex(dates))
CodePudding user response:
First create MultiIndex
by DatetimeIndex.strftime
and DatetimeIndex.year
and reshape by Series.unstack
:
df.index = [df.index.strftime('%m-%d %H:%M:%S'), df.index.year]
df = df['records'].unstack()
print (df)
2020 2021 2022
01-01 00:00:00 1 3 5
01-01 00:15:00 2 4 6
CodePudding user response:
My answer is for sure neither pretty nor elegant but it should get the job done
records = [1, 2, 3, 4, 5, 6, 7, 8]
dates = [
"2020-01-01 00:00:00",
"2020-01-01 00:15:00",
"2021-01-01 00:00:00",
"2021-01-01 00:15:00",
"2021-01-01 00:45:00",
"2022-01-01 00:00:00",
"2022-01-01 00:15:00",
"2022-01-01 00:30:00",
]
data_list = [[],[],[]]
for subListIndex, each_year in enumerate(range(2020,2023)):
for each_hour in range(0,24):
each_hour = str(each_hour).zfill(2)
for each_quarter in range(0,60,15):
each_quarter = str(each_quarter).zfill(2)
date = str(each_year) "-01-01 " each_hour ":" each_quarter ":00"
for index, each_date in enumerate(dates):
if each_date == date:
data_list[subListIndex].append(records[index])
break
else:
data_list[subListIndex].append(0)
print(data_list)
Use can directly put this List into a pandas dataframe
import pandas as pd
pd.DataFrame(data_list)