I am trying to convert quartiles into month for each unique id in pandas by filling other columns with previous row values. I have seen this one Pandas: Add data for missing months, but it's for only one id(it doesn't work for multiple id), how can we do the same thing if we have multiple id? For instance I have a data like this
import numpy as np
import pandas as pd
index = [0,1,2,3]
id = pd.Series([1,1,2,2],index= index)
price = pd.Series([20,41,61,68],index= index)
date_month = pd.Series(['2021-01','2021-04','2021-01','2021-04'],index= index)
df = pd.DataFrame(id,columns = ["id"])
df["price"] =price
df['date_month'] = date_month
df
But I want the output be like
index = [0,1,2,3,4,5, 6, 7,8,9,10,11]
id = pd.Series([1,1,1,1,1,1,2, 2, 2,2,2,2],index= index)
price = pd.Series([20,20,20, 41,41,41, 61,61, 61, 68,68,68],index= index)
date_month = pd.Series(['2021-01', '2021-02','2021-03', '2021-04', '2021-05','2021-06','2021-01', '2021-02', '2021-03','2021-04', '2021-05','2021-06'],index= index)
df = pd.DataFrame(id,columns = ["id"])
df["price"] =price
df['date_month'] = date_month
df
FYI, there are other columns too, which we want fill by the value of the first month of the each quarter.
CodePudding user response:
First, create a date range dataframe containing the months from the smallest month to the largest month.
import datetime
date_month = pd.to_datetime(df["date_month"])
min_date = date_month.min()
max_date = date_month.max() pd.tseries.offsets.QuarterEnd()
date_range_df = pd.DataFrame(pd.date_range(min_date, max_date, freq="M"), columns=["date"])
date_range_df
Then, create a quarter month map dataframe based on date_range_df
def quarter2month(quarter):
return (quarter -1) * 3 1
quarter_month_map_df = date_range_df.assign(
date_month = date_range_df["date"].map(lambda x:datetime.datetime.strftime(x,"%Y-%m")),
date_quarter = date_range_df["date"].map(lambda x:f"{x.year}-{quarter2month(x.quarter):02}"),
).drop("date",axis=1)
quarter_month_map_df
The reuslt will be
date_month date_quarter
0 2021-01 2021-01
1 2021-02 2021-01
2 2021-03 2021-01
3 2021-04 2021-04
4 2021-05 2021-04
5 2021-06 2021-04
Finally, you can merge the quarter_month_map_df
with original df
pd.merge(df.rename({"date_month": "date_quarter"}, axis=1),
quarter_month_map_df, on='date_quarter', how="left")