Home > Mobile >  Python Pandas DataFrame: replicate dataframe values for multiple years considering leap and not leap
Python Pandas DataFrame: replicate dataframe values for multiple years considering leap and not leap

Time:10-11

I have a dataframe df with time index (15 minutes time steps) with values representing a leap year (2012). The Goal is to use the same values of df for the next years considering the fact of next not leap and leap years. It means the values have to be same for all months except for February which varying between 28 and 29 days.

I did the following steps: given is dataframe df for the year 2012

  1. I separately create a data range of the next 10 years starting with 2012 ending at 2022

  2. I used the pandas concat() function with this following code line to replicate the values of df: df_until_2022= pd.concat([df]*10, ignore_index=True)

  3. I used the the pandas join() and set_index() functions in order to set the new data range from step 1 with the next 5 years as index.

The issue now is the number of rows which depends on the type of the year (leap or not leap)

The question: how to automatically check the type of the next year and replicate the value of the given dataframe df based on that check-information?

CodePudding user response:

Use:

np.random.seed(2022)
    
i = pd.date_range('2012-01-01', '2012-12-31 23:45:00', freq='15Min')
df = pd.DataFrame({'a':np.random.rand(len(i))}, index=i)

Idea is add new row with last Timestamp, here 2022-12-31 23:45:00 and then create 15Min Datetimeindex by DataFrame.asfreq, last copy first year data by GroupBy.ffill:

df = pd.concat([df, df.iloc[[-1]].rename(lambda x: x.replace(year=2022))])
df1 = df.asfreq('15Min')
df1 = df1.groupby(df1.index.strftime('%M%d %H:%M:%S')).ffill()

print (df1)
                            a
2012-01-01 00:00:00  0.009359
2012-01-01 00:15:00  0.499058
2012-01-01 00:30:00  0.113384
2012-01-01 00:45:00  0.049974
2012-01-01 01:00:00  0.685408
                      ...
2022-12-31 22:45:00  0.401750
2022-12-31 23:00:00  0.590506
2022-12-31 23:15:00  0.297647
2022-12-31 23:30:00  0.243992
2022-12-31 23:45:00  0.036991

[385728 rows x 1 columns]

Another solution is create all datetimes by DataFrame.reindex, then join values with omit first 4 year values in second level and convert to datetimes by to_datetime with errors='coerce' - it replace 29 February datetimes to NaT if not exist, so last remove this rows by Index.notna in boolean indexing:

np.random.seed(2022)
    
i = pd.date_range('2012-01-01', '2012-12-31 23:45:00', freq='15Min')
df = pd.DataFrame({'a':np.random.rand(len(i))}, index=i)

years = pd.MultiIndex.from_product([range(2012, 2023), df.index])

df1 = df.reindex(years, level=1)
df1.index = pd.to_datetime(df1.index.map(lambda x:f'{x[0]}{str(x[1])[4:]}'),errors='coerce')
df1 = df1[df1.index.notna()]

print (df1)
                            a
2012-01-01 00:00:00  0.009359
2012-01-01 00:15:00  0.499058
2012-01-01 00:30:00  0.113384
2012-01-01 00:45:00  0.049974
2012-01-01 01:00:00  0.685408
                      ...
2022-12-31 22:45:00  0.401750
2022-12-31 23:00:00  0.590506
2022-12-31 23:15:00  0.297647
2022-12-31 23:30:00  0.243992
2022-12-31 23:45:00  0.036991

[385728 rows x 1 columns]
  • Related