Home > Mobile >  add months in an existing data frame python
add months in an existing data frame python

Time:10-28

Year Price
2017 200
2018 250
2019 300

Given the table above, is there a way to add months to each year ? For eg: 2017 should have months jan to dec and the same price carried forward in all of the 12 months for all the years listed in a data frame in Pandas?

Year Price
2017/01/01 200
2017/02/01 200
2017/03/01 200
2017/04/01 200
2017/05/01 200

CodePudding user response:

There's probably a better answer out there (I know very little Pandas), but one thing that comes to mind is:

Get the date represented by your numeric "Year". That will give you January 1st at midnight in that Year. You can drop the time part (the "hour", if you may) and keep just the date (January 1st of that year)

At this point you'll have your first row being January (month 1). Then you can replicate the row changing the "Year"'s month to 2 (February), 3 (March)... until... 12 (December) and insert it back in the Dataframe

import pandas as pd

df = pd.DataFrame([
    {"Year": 2017, "Price": 200},
    {"Year": 2018, "Price": 300},
    {"Year": 2019, "Price": 400},
])
df["Year"] = pd.to_datetime(df["Year"], format='%Y').dt.date

for idx, row in df.iterrows():
    for i in range(2, 13):
        row["Year"] = row["Year"].replace(month=i)
        df = pd.concat([df, row.to_frame().T])
df = df.sort_values(['Year']).reset_index(drop=True)
print(df)

#           Year Price
# 0   2017-01-01   200
# 1   2017-02-01   200
# 2   2017-03-01   200
# 3   2017-04-01   200
# 4   2017-05-01   200
# 5   2017-06-01   200
# 6   2017-07-01   200
# 7   2017-08-01   200
# 8   2017-09-01   200
# 9   2017-10-01   200
# 10  2017-11-01   200
# 11  2017-12-01   200
# 12  2018-01-01   300
# 13  2018-02-01   300
# 14  2018-03-01   300
# 15  2018-04-01   300
# 16  2018-05-01   300
# 17  2018-06-01   300
# 18  2018-07-01   300
# 19  2018-08-01   300
# 20  2018-09-01   300
# 21  2018-10-01   300
# 22  2018-11-01   300
# 23  2018-12-01   300
# 24  2019-01-01   400
# 25  2019-02-01   400
# 26  2019-03-01   400
# 27  2019-04-01   400
# 28  2019-05-01   400
# 29  2019-06-01   400
# 30  2019-07-01   400
# 31  2019-08-01   400
# 32  2019-09-01   400
# 33  2019-10-01   400
# 34  2019-11-01   400
# 35  2019-12-01   400

CodePudding user response:

You could try this:

df.columns = [i.strip() for i in df.columns]
df['Year'] = df['Year'].apply(lambda x: pd.date_range(start=str(x), end=str(x 1), freq='1M').strftime('%m'))
df = df.explode('Year').reset_index(drop=True)

>>>df
    Year    Price
0   01  200
1   02  200
2   03  200
3   04  200
4   05  200
5   06  200
6   07  200
7   08  200
8   09  200
9   10  200
10  11  200
11  12  200
12  01  250
13  02  250
14  03  250
15  04  250
16  05  250
17  06  250
18  07  250
19  08  250
20  09  250
21  10  250
22  11  250
23  12  250
24  01  300
25  02  300
26  03  300
27  04  300
28  05  300
29  06  300
30  07  300
31  08  300
32  09  300
33  10  300
34  11  300
35  12  300

CodePudding user response:

Create a dataframe with months 1-12

Cross merge that with your original data

Create a date out of the year, month, and day 1

Sample code:

years = [2017, 2018, 2019, 2020, 2021, 2022]
prices = [200, 250, 300, 350, 350, 317]
your_df = pd.DataFrame(data=[(x, y) for x, y in zip(years, prices)], columns=["Year","Price"])

months = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
m_df = pd.DataFrame(data=months, columns=["Month"])

final_df = full_df.merge(your_df, how="cross")

final_df["Year"] = [datetime(y, m, 1) for y,m in zip(full_df.Year, full_df.Month)]
final_df = final_df.drop(columns="Month")
final_df

Final output

  • Related