I had a data frame in pandas where a variable consisted of Quarters and Years, and where I wanted to split those into Months (for Quarters) and Quarters and Months (for Years). The original data frame looked like this:
volume tenor NewTenor
170 -3 quarter Q1 21
516 -3 quarter Q1 22
597 22 quarter Q1 22
622 -3 year Cal 21
625 22 quarter Q2 22
657 -14 year Cal 21
678 -16 quarter Q1 22
704 -7 year Cal 21
750 -16 quarter Q1 22
934 -10 year Cal 21
And using the following code:
def split_tenor(tenor):
start, year = tenor.split(" ")
if start == "Cal":
months = ["Jan", "Feb", "Mar", "Q2", "Q3", "Q4"]
year = int(year) 1
elif start == "Q1":
months = ["Jan", "Feb", "Mar"]
elif start == "Q2":
months = ["Apr", "May", "Jun"]
elif start == "Q3":
months = ["Jul", "Aug", "Sep"]
elif start == "Q4":
months = ["Oct", "Nov", "Dec"]
else:
return tenor
return [f"{m} {year}" for m in months]
my_data["NewTenor"] = my_data["NewTenor"].apply(split_tenor)
my_data = my_data.explode("NewTenor")
I managed to turn it into this:
volume tenor NewTenor
170 -3 quarter Jan 21
170 -3 quarter Feb 21
170 -3 quarter Mar 21
516 -3 quarter Jan 22
516 -3 quarter Feb 22
516 -3 quarter Mar 22
597 22 quarter Jan 22
597 22 quarter Feb 22
597 22 quarter Mar 22
622 -3 year Jan 22
622 -3 year Feb 22
622 -3 year Mar 22
622 -3 year Q2 22
622 -3 year Q3 22
622 -3 year Q4 22
625 22 quarter Apr 22
625 22 quarter May 22
625 22 quarter Jun 22
657 -14 year Jan 22
657 -14 year Feb 22
However, the volume
in the data frame remains the same, although it should split throughout the period accordingly (e.g. when a quarter is split into months, the corresponding volume should also split into three equal volumes).
Can someone help me split the volume at the same time as I split the period? Thank you.
EDIT
The right code should return something like this:
df_1 = pd.DataFrame({'volume':[12, 9],
'tenor':['year', 'quarter'],
'NewTenor':['Cal 21', 'Q2 22']})
Sould return:
df_2 = pd.DataFrame({'volume':[1, 1, 1, 3, 3, 3, 3, 3, 3],
'tenor':['year', 'year', 'year', 'year', 'year', 'year', 'quarter', 'quarter', 'quarter'],
'NewTenor':['Jan 21', 'Feb 21', 'Mar 21', 'Q2 21', 'Q3 21', 'Q4 21', 'Apr 22', 'May 22', 'Jun 22']})
CodePudding user response:
You could do it by modifying your existing code to create two columns at once:
df = pd.DataFrame({'volume':[12, 9],
'tenor':['year', 'quarter'],
'NewTenor':['Cal 21', 'Q2 22']})
def split_tenor(row):
start, year = row['NewTenor'].split(" ")
if start == "Cal":
months = ["Jan", "Feb", "Mar", "Q2", "Q3", "Q4"]
year = int(year) 1
elif start == "Q1":
months = ["Jan", "Feb", "Mar"]
elif start == "Q2":
months = ["Apr", "May", "Jun"]
elif start == "Q3":
months = ["Jul", "Aug", "Sep"]
elif start == "Q4":
months = ["Oct", "Nov", "Dec"]
else:
return tenor, tenor
if start == "Cal":
split_vol = [row['volume']/12] * 3 [row['volume']/4] * 3
else:
split_vol = [row['volume']/len(months)] * len(months)
return [f"{m} {year}" for m in months], split_vol
df["NewTenor"], df["NewVolume"] = zip(*df[["volume", "NewTenor"]].apply(split_tenor, axis = 1))
# in case you are using earlier version of pandas, change this line to explode two columns separately
df = df.explode(["NewTenor", "NewVolume"])