Home > Enterprise >  Combining two columns in Pandas using a set rule
Combining two columns in Pandas using a set rule

Time:05-26

I have a large Pandas dataframe that looks roughly like this:

df = pd.Dataframe({'1m1y_vol': {0: 71.0, 1: 60.1, 2: 68.95},
                   '1m25y_vol': {0: 75.9, 1: 81.45, 2: 89.4}, 
                   'Days_since_meeting': {0: 8, 1: 1, 2: 5}, 
                   'Days_to_meeting': {0: -50, 1: -39, 2: -35}})

The _vol columns represent some volatilities, Days_since_meeting is the number of days elapsed since a certain event occurred and Days_to_meeting is the number of days to go until the next event occurs (both columns contain integers).

I would like to create a new column called time_window that returns the value of Days_since_meeting if its value is <= 10 days, or return the value of Days_to_meeting otherwise. The objective is that the time_window column only contains integers between -10 and 10 (I have already filtered the dataframe to only keep data where Days_since_meeting <= 10 or Days_to_meeting >= -10, so it's now just a matter of combining the two columns).

I have tried the following code with the df.combine() method to create a new one-column dataframe that I could concatenate with the main dataframe but it only returns NaNs:

time_window = df.loc[:,["Days_to_meeting"]].rename(columns{"Days_to_meeting":"Time_window"})
days_since_meeting = df.loc[:,["Days_since_meeting"]].rename(columns={"Days_since_meeting":"Time_window"})


def define_time_window_func(s1, s2):
    s3 = pd.Series()
    for i, j in zip(s1, s2):
        if i <= 10:
            s3.append(pd.Series([i]))
        else:
            s3.append(pd.Series([j]))



time_window.combine(days_since_meeting, define_time_window_func)

Any help would be really appreciated!

CodePudding user response:

Is this what you are looking for if not I'm not entirely sure I understand the question.

df = pd.DataFrame({'1m1y_vol': {0: 71.0, 1: 60.1, 2: 68.95},
                   '1m25y_vol': {0: 75.9, 1: 81.45, 2: 89.4}, 
                   'Days_since_meeting': {0: 8, 1: 1, 2: 5}, 
                   'Days_to_meeting': {0: -50, 1: -39, 2: -35}})
df['time_window'] = np.where(df['Days_since_meeting'] <= 10, df['Days_since_meeting'], df['Days_to_meeting'])
df
  • Related