Home > database >  Setting variables conditionally while looping thru dataframe
Setting variables conditionally while looping thru dataframe

Time:02-02

Having difficulty isolating data from other dataframes while vectorizing/looping through a dataframe.

df:

NAME CURRENT_BAL AVAIL_BAL TYPE
Abc 1,000 2,000 TypeA
Def 1,000 2,000 TypeB

settings:

Scenario TypeA TypeB
Default 1 2

df - desired output

NAME CURRENT_BAL AVAIL_BAL TYPE 02/28/2023 03/31/2023
Abc 1,000 2,000 TypeA 2,000 0
Def 1,000 2,000 TypeB 1,000 1,000

The settings df tells the script how to spread out the Available Balance over the following periods. This is a reduced version of it and I'm looking to use specific month-end dates and longer periods with a variety of settings.

def build_table(df: pd.DataFrame, periods: int = 2, scenario: str = 'Default'):
    from datetime import date

    def create_entry(df: pd.DataFrame, i: int = 1, start_period: int = 1):
        number_periods = settings.loc[settings.Scenario == 'Default', df.Type]
        return (df.Avail_Bal / number_periods) if i <= number_periods else 0

    for i in range(periods):
        df[ (date.toady()   pd.offsets.MonthEnd()   pd.DateOffset(months=i)).date() ] = \
            create_entry(df, i, scenario)
    return df

As of now, running into errors with this line in trying to get a conditional value related to 1 row of the dataframe while its being vectorized:

number_periods = settings.loc[settings.Scenario == 'Default', df.Type]

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

There is a lot of work to be done on this project, but for now I'm trying to understand where I can use vectorization, and when I have to iterate through the rows.

CodePudding user response:

I believe you want to use .apply() to apply the create_entry function to your dataframe. In addition the result of your .loc method is a series, so you need to select the actual value (I am doing this with .max()

df = pd.DataFrame({
    'NAME': ['Abc', 'Def'], 
    'CURRENT_BAL': [1000, 1000],
    'AVAIL_BAL': [2000, 2000], 
    'TYPE': ['TypeA', 'TypeB']
})
settings = pd.DataFrame({'Scenario': ['Default'], 'TypeA': [1], 'TypeB': [2]})


def create_entry(row, i):
    number_periods = settings.loc[settings.Scenario == 'Default', row.TYPE].max()
    return (row.AVAIL_BAL / number_periods) if i <= number_periods else 0

for i in range(1, 3):
    date_col = (date.today()   pd.offsets.MonthEnd()   pd.DateOffset(months=i)).date()
    df[date_col] = df.apply(lambda x: create_entry(x, i), axis=1)

output:

  NAME  CURRENT_BAL  AVAIL_BAL   TYPE  2023-03-28  2023-04-28
0  Abc         1000       2000  TypeA      2000.0         0.0
1  Def         1000       2000  TypeB      1000.0      1000.0
  • Related