Home > Enterprise >  Create a row for each year between two dates
Create a row for each year between two dates

Time:06-11

I have a dataframe with two date columns (format: YYYY-MM-DD). I want to create one row for each year between those two dates. The rows would be identical with a new column which specifies the year. For example, if the dates are 2018-01-01 and 2020-01-01 then there would be three rows with same data and a new column with values 2018, 2019, and 2020.

CodePudding user response:

You can use a custom function to compute the range then explode the column:

# Ensure to have datetime
df['date1'] = pd.to_datetime(df['date1'])
df['date2'] = pd.to_datetime(df['date2'])

# Create the new column
date_range = lambda x: range(x['date1'].year, x['date2'].year 1)
df = df.assign(year=df.apply(date_range, axis=1)).explode('year', ignore_index=True)

Output:

>>> df
       date1      date2  year
0 2018-01-01 2020-01-01  2018
1 2018-01-01 2020-01-01  2019
2 2018-01-01 2020-01-01  2020

CodePudding user response:

This should work for you:

import pandas

# some sample data
df = pandas.DataFrame(data={
    'foo': ['bar', 'baz'], 
    'date1':['2018-01-01', '2022-01-01'], 
    'date2':['2020-01-01', '2017-01-01']
})

# cast date columns to datetime
for col in ['date1', 'date2']:
    df[col] = pandas.to_datetime(df[col])

# reset index to ensure that selection by length of index works
df = df.reset_index(drop=True)

# the range of years between the two dates, and iterate through the resulting
# series to unpack the range of years and add a new row with the original data and the year
for i, years in df.apply(
    lambda x: range(
        min(x.date1, x.date2).year,
        max(x.date1, x.date2).year   1
    ),
    axis='columns'
).iteritems():
    for year in years:
        new_index = len(df.index)
        df.loc[new_index] = df.loc[i].values
        df.loc[new_index, 'year'] = int(year)

output:

>>> df
    foo      date1      date2    year
0   bar 2018-01-01 2020-01-01     NaN
1   baz 2022-01-01 2017-01-01     NaN
2   bar 2018-01-01 2020-01-01  2018.0
3   bar 2018-01-01 2020-01-01  2019.0
4   bar 2018-01-01 2020-01-01  2020.0
5   baz 2022-01-01 2017-01-01  2017.0
6   baz 2022-01-01 2017-01-01  2018.0
7   baz 2022-01-01 2017-01-01  2019.0
8   baz 2022-01-01 2017-01-01  2020.0
9   baz 2022-01-01 2017-01-01  2021.0
10  baz 2022-01-01 2017-01-01  2022.0
  • Related