Home > database >  Optional function argument which subsets a list?
Optional function argument which subsets a list?

Time:02-18

Let's suppose that I have the following DataFrame to work with:

df1 = pd.DataFrame(
    {
        "CST": [1, 2, 3],
        "BRAND": ["A", "B", "C"],
    }
)

print(df1)

   CST BRAND
0    1     A
1    2     B
2    3     C

Next, I define the following helper function:

def add_col(df, field_name, only_x=True):
    attrs = ["X", "Y", "Z"]
    for attr in attrs:
        df["_".join([field_name, attr])] = 999
    return df

df2 = add_col(df1, "BRAND")
print(df2)

   CST BRAND  BRAND_X  BRAND_Y  BRAND_Z
0    1     A      999      999      999
1    2     B      999      999      999
2    3     C      999      999      999

As you can see in the function above, I have a boolean argument only_x which is not implemented/used anywherefor the moment inside the function. The goal of this argument is self explanatory: when set to True (default), in the above example, it should only create the BRAND_X column.

This might be a fairly easy implementation but, having little experience in Python/programming, I struggle with finding the best approach. The only one that I could come up with is the following:

def add_col(df, field_name, only_x=True):
    attrs = ["X", "Y", "Z"]
    if only_x:
        attrs = [attrs[0]]
    for attr in attrs:
        df["_".join([field_name, attr])] = 999
    return df

df2 = add_col(df1, "BRAND", only_x=True)
print(df2)

   CST BRAND  BRAND_X
0    1     A      999
1    2     B      999
2    3     C      999

Although this does work, I'm not sure this is the best way to go. Is this considered good code in the way the logic is structured? If not, what would be some better approaches here?

CodePudding user response:

Test the parameter when assigning the variable

def add_col(df, field_name, only_x=True):
    attrs = ["X"] if only_x else ["X", "Y", "Z"]
    // rest of function

But a more general design might be to make attrs a parameter with a suitable default.

def add_col(df, field_name, attrs = ["X", "Y", "Z"]):

Then when you want only the X column, use attrs=["X"] in the call.

  • Related