Home > Software design >  Concert quartile into monthly data for each id while fill other columns by the first month of quarte
Concert quartile into monthly data for each id while fill other columns by the first month of quarte

Time:07-01

I am trying to convert quartiles into month for each unique id in pandas by filling other columns with previous row values. I have seen this one Pandas: Add data for missing months, but it's for only one id(it doesn't work for multiple id), how can we do the same thing if we have multiple id? For instance I have a data like this

import numpy as np
import pandas as pd
index = [0,1,2,3]
id = pd.Series([1,1,2,2],index= index)
price = pd.Series([20,41,61,68],index= index)
date_month = pd.Series(['2021-01','2021-04','2021-01','2021-04'],index= index)
df = pd.DataFrame(id,columns = ["id"])
df["price"] =price
df['date_month'] = date_month
df

But I want the output be like

index = [0,1,2,3,4,5, 6, 7,8,9,10,11]
id = pd.Series([1,1,1,1,1,1,2, 2, 2,2,2,2],index= index)
price = pd.Series([20,20,20, 41,41,41, 61,61, 61, 68,68,68],index= index)
date_month = pd.Series(['2021-01', '2021-02','2021-03', '2021-04', '2021-05','2021-06','2021-01', '2021-02', '2021-03','2021-04', '2021-05','2021-06'],index= index)
df = pd.DataFrame(id,columns = ["id"])
df["price"] =price
df['date_month'] = date_month
df 

FYI, there are other columns too, which we want fill by the value of the first month of the each quarter.

CodePudding user response:

First, create a date range dataframe containing the months from the smallest month to the largest month.

import datetime

date_month = pd.to_datetime(df["date_month"])
min_date = date_month.min()
max_date = date_month.max()   pd.tseries.offsets.QuarterEnd()
date_range_df = pd.DataFrame(pd.date_range(min_date, max_date, freq="M"), columns=["date"])
date_range_df

Then, create a quarter month map dataframe based on date_range_df

def quarter2month(quarter):
    return (quarter -1) * 3   1

quarter_month_map_df = date_range_df.assign(
    date_month = date_range_df["date"].map(lambda x:datetime.datetime.strftime(x,"%Y-%m")),
    date_quarter = date_range_df["date"].map(lambda x:f"{x.year}-{quarter2month(x.quarter):02}"),
).drop("date",axis=1)

quarter_month_map_df

The reuslt will be

 date_month date_quarter
0 2021-01 2021-01
1 2021-02 2021-01
2 2021-03 2021-01
3 2021-04 2021-04
4 2021-05 2021-04
5 2021-06 2021-04

Finally, you can merge the quarter_month_map_df with original df

pd.merge(df.rename({"date_month": "date_quarter"}, axis=1),
     quarter_month_map_df, on='date_quarter', how="left")
  • Related