Let's suppose that I have the following DataFrame
to work with:
df1 = pd.DataFrame(
{
"CST": [1, 2, 3],
"BRAND": ["A", "B", "C"],
}
)
print(df1)
CST BRAND
0 1 A
1 2 B
2 3 C
Next, I define the following helper function:
def add_col(df, field_name, only_x=True):
attrs = ["X", "Y", "Z"]
for attr in attrs:
df["_".join([field_name, attr])] = 999
return df
df2 = add_col(df1, "BRAND")
print(df2)
CST BRAND BRAND_X BRAND_Y BRAND_Z
0 1 A 999 999 999
1 2 B 999 999 999
2 3 C 999 999 999
As you can see in the function above, I have a boolean argument only_x
which is not implemented/used anywherefor the moment inside the function. The goal of this argument is self explanatory: when set to True
(default), in the above example, it should only create the BRAND_X
column.
This might be a fairly easy implementation but, having little experience in Python/programming, I struggle with finding the best approach. The only one that I could come up with is the following:
def add_col(df, field_name, only_x=True):
attrs = ["X", "Y", "Z"]
if only_x:
attrs = [attrs[0]]
for attr in attrs:
df["_".join([field_name, attr])] = 999
return df
df2 = add_col(df1, "BRAND", only_x=True)
print(df2)
CST BRAND BRAND_X
0 1 A 999
1 2 B 999
2 3 C 999
Although this does work, I'm not sure this is the best way to go. Is this considered good code in the way the logic is structured? If not, what would be some better approaches here?
CodePudding user response:
Test the parameter when assigning the variable
def add_col(df, field_name, only_x=True):
attrs = ["X"] if only_x else ["X", "Y", "Z"]
// rest of function
But a more general design might be to make attrs
a parameter with a suitable default.
def add_col(df, field_name, attrs = ["X", "Y", "Z"]):
Then when you want only the X
column, use attrs=["X"]
in the call.