I would like to generate a dataframe with the days of the year appended to the first column based on a specified year. How can I do this? I am using the pandas date_range
module.
Here is what I have tried:
#Import modules
import pandas as pd
import numpy as np
import datetime as dt
#Specify the year
year = 1976
#Create dataframe
df = pd.Series(pd.date_range(year, periods=365, freq='D'))
print(df)
The result:
0 1970-01-01 00:00:00.000001976
1 1970-01-02 00:00:00.000001976
2 1970-01-03 00:00:00.000001976
3 1970-01-04 00:00:00.000001976
4 1970-01-05 00:00:00.000001976
...
360 1970-12-27 00:00:00.000001976
361 1970-12-28 00:00:00.000001976
362 1970-12-29 00:00:00.000001976
363 1970-12-30 00:00:00.000001976
364 1970-12-31 00:00:00.000001976
Length: 365, dtype: datetime64[ns]
The year is wrong here, I need it to be 1976. Additionally, all I need is a "Day of the Year" column with the number of rows corresponding to the number of days in the year (this would account for leap years). How can I fix this?
The output should be a dataframe that looks like this (it should extend all the way to the last day of the year):
d = {
'year': [1976, 1976, 1976, 1976, 1976, 1976],
'day of the year': [1, 2, 3, 4, 5, 6]
}
df1 = pd.DataFrame(data=d)
df1
CodePudding user response:
year = 1976
dates = pd.Series(pd.date_range(str(year) "-01-01", str(year) "-12-31", freq="D"))
days = dates.diff().astype("timedelta64[D]").fillna(1).cumsum()
df = pd.DataFrame({"year": dates.dt.year, "days": days})
df = df.set_index(dates)
print(df)
# year days
# 1976-01-01 1976 1.0
# 1976-01-02 1976 2.0
# 1976-01-03 1976 3.0
# 1976-01-04 1976 4.0
# 1976-01-05 1976 5.0
# ... ... ...
# 1976-12-27 1976 362.0
# 1976-12-28 1976 363.0
# 1976-12-29 1976 364.0
# 1976-12-30 1976 365.0
# 1976-12-31 1976 366.0
# [366 rows x 2 columns]
Or
import calendar
year = 1976
n_days = 366 if calendar.isleap(year) else 365
df = pd.DataFrame({"year": year,
"days": range(1, n_days 1)})