This is kind of a strange question, but I have created a function that leverages pivot_table
and some filtering and renaming to apply to a bunch of pivot/aggregation use cases I need.
One of the parameters of the function is a list of aggregation functions i.e np.median
. Another parameter of the function is a string referencing that aggregation function, i.e median
.
I have this latter parameter solely so I can use it to filter columns out. I was wondering if there is a way to create a substring from np.median
? Ideally I wouldn't need to have a string parameter in addition to the numpy function.
The challenge I am finding is that np.median
(or any numpy aggregation function) has a type [<function median at 0x7fa1d043b700>]
so I can't treat it with any string splitting operations to pull median
from it.
Is this possible?
Sample Dataframe
data = [
["2", "dog", "groomed", 100],
["2", "dog", "groomed", 90],
["2", "dog", "ungroomed", 30],
["3", "cat", "groomed", 25],
["3", "cat", "ungroomed", 10],
]
df = pd.DataFrame(data, columns=["ID", "pet", "status", "amount"])
Function
from typing import List
def long_to_wide_reshape_w_agg(
input_df: pd.DataFrame,
index_list: List[str],
col_to_pivot: str,
vals: str,
suffix: str,
aggs: List = np.mean,
agg_method: str = "mean",
):
# identify possible values for the column we want to pivot on
# we need these to filter out columns we want to rename in later steps
cols = input_df[col_to_pivot].unique().tolist()
str_cols = [x for x in cols if isinstance(x, str)]
reshaped_df = input_df.pivot_table(
index=index_list,
columns=col_to_pivot,
aggfunc=aggs,
values=vals,
).reset_index()
# flatten hierarchical index
reshaped_df.columns = [" ".join(col).strip() for col in reshaped_df.columns.values]
# identify columns to rename
cols_to_rename = [
s for s in reshaped_df.columns.values if any(subs in s for subs in str_cols)
]
tuple_cols = tuple(cols_to_rename)
# rename columns as needed
reshaped_df = reshaped_df.rename(
columns=lambda col: f"{col}{suffix}" if col in tuple_cols else col
)
# remove spaces and replace with underscores
reshaped_df.columns = [cols.replace(" ", "_") for cols in reshaped_df.columns]
# based on agg_method chosen, filter to ensure there are no null values for those columns
col1, col2 = [col for col in reshaped_df.columns if agg_method in col]
print(col1)
print(col2)
### do stuff with col1/col2
return reshaped_df
Function Use Case
long_to_wide_reshape(
input_df=df,
index_list=["ID", "pet"],
col_to_pivot="status",
aggs=[np.median],
vals="amount",
suffix="_amount",
agg_method="median",
)
CodePudding user response:
How about using .__name__
?
...
col1, col2 = [col for col in reshaped_df.columns if aggs[0].__name__ in col]
...
Because...
>>> np.median
<function numpy.median(a, axis=None, out=None, overwrite_input=False, keepdims=False)>
>>> np.median.__name__
'median'