Home > Enterprise >  Pandas: Create a column as a function of another data frame's column
Pandas: Create a column as a function of another data frame's column

Time:05-25

I had a data frame in pandas where a variable consisted of Quarters and Years, and where I wanted to split those into Months (for Quarters) and Quarters and Months (for Years). The original data frame looked like this:

        volume  tenor      NewTenor
170     -3      quarter     Q1 21
516     -3      quarter     Q1 22
597     22      quarter     Q1 22
622     -3      year        Cal 21
625     22      quarter     Q2 22
657     -14     year        Cal 21
678     -16     quarter     Q1 22
704     -7      year        Cal 21
750     -16     quarter     Q1 22
934     -10     year        Cal 21

And using the following code:

def split_tenor(tenor):
    start, year = tenor.split(" ")
    if start == "Cal":
        months = ["Jan", "Feb", "Mar", "Q2", "Q3", "Q4"]
        year = int(year)   1
    elif start == "Q1":
        months = ["Jan", "Feb", "Mar"]
    elif start == "Q2":
        months = ["Apr", "May", "Jun"]
    elif start == "Q3":
        months = ["Jul", "Aug", "Sep"]
    elif start == "Q4":
        months = ["Oct", "Nov", "Dec"]
    else:
        return tenor

    return [f"{m} {year}" for m in months]

my_data["NewTenor"] = my_data["NewTenor"].apply(split_tenor)
my_data = my_data.explode("NewTenor")

I managed to turn it into this:

     volume     tenor      NewTenor
170     -3      quarter     Jan 21
170     -3      quarter     Feb 21
170     -3      quarter     Mar 21
516     -3      quarter     Jan 22
516     -3      quarter     Feb 22
516     -3      quarter     Mar 22
597     22      quarter     Jan 22
597     22      quarter     Feb 22
597     22      quarter     Mar 22
622     -3      year        Jan 22
622     -3      year        Feb 22
622     -3      year        Mar 22
622     -3      year        Q2 22
622     -3      year        Q3 22
622     -3      year        Q4 22
625     22      quarter     Apr 22
625     22      quarter     May 22
625     22      quarter     Jun 22
657     -14     year        Jan 22
657     -14     year        Feb 22

However, the volume in the data frame remains the same, although it should split throughout the period accordingly (e.g. when a quarter is split into months, the corresponding volume should also split into three equal volumes).

Can someone help me split the volume at the same time as I split the period? Thank you.

EDIT

The right code should return something like this:

df_1 = pd.DataFrame({'volume':[12, 9],
               'tenor':['year', 'quarter'],
               'NewTenor':['Cal 21', 'Q2 22']})

Sould return:

df_2 = pd.DataFrame({'volume':[1, 1, 1, 3, 3, 3, 3, 3, 3],
               'tenor':['year', 'year', 'year', 'year', 'year', 'year', 'quarter', 'quarter', 'quarter'],
               'NewTenor':['Jan 21', 'Feb 21', 'Mar 21', 'Q2 21', 'Q3 21', 'Q4 21', 'Apr 22', 'May 22', 'Jun 22']})

CodePudding user response:

You could do it by modifying your existing code to create two columns at once:

df = pd.DataFrame({'volume':[12, 9],
                   'tenor':['year', 'quarter'],
                   'NewTenor':['Cal 21', 'Q2 22']})
def split_tenor(row):
    start, year = row['NewTenor'].split(" ")
    if start == "Cal":
        months = ["Jan", "Feb", "Mar", "Q2", "Q3", "Q4"]
        year = int(year)   1
    elif start == "Q1":
        months = ["Jan", "Feb", "Mar"]
    elif start == "Q2":
        months = ["Apr", "May", "Jun"]
    elif start == "Q3":
        months = ["Jul", "Aug", "Sep"]
    elif start == "Q4":
        months = ["Oct", "Nov", "Dec"]
    else:
        return tenor, tenor

    if start == "Cal":
        split_vol = [row['volume']/12] * 3   [row['volume']/4] * 3
    else:
        split_vol = [row['volume']/len(months)] * len(months)
    
    return [f"{m} {year}" for m in months], split_vol

df["NewTenor"], df["NewVolume"] = zip(*df[["volume", "NewTenor"]].apply(split_tenor, axis = 1))
# in case you are using earlier version of pandas, change this line to explode two columns separately
df = df.explode(["NewTenor", "NewVolume"])
  • Related