Home > Enterprise >  Can you create a substring from numpy function call? (i.e Extract "median" from `np.median
Can you create a substring from numpy function call? (i.e Extract "median" from `np.median

Time:02-22

This is kind of a strange question, but I have created a function that leverages pivot_table and some filtering and renaming to apply to a bunch of pivot/aggregation use cases I need.

One of the parameters of the function is a list of aggregation functions i.e np.median. Another parameter of the function is a string referencing that aggregation function, i.e median.

I have this latter parameter solely so I can use it to filter columns out. I was wondering if there is a way to create a substring from np.median? Ideally I wouldn't need to have a string parameter in addition to the numpy function.

The challenge I am finding is that np.median (or any numpy aggregation function) has a type [<function median at 0x7fa1d043b700>] so I can't treat it with any string splitting operations to pull median from it.

Is this possible?

Sample Dataframe

data = [
    ["2", "dog", "groomed", 100],
    ["2", "dog", "groomed", 90],
    ["2", "dog", "ungroomed", 30],
    ["3", "cat", "groomed", 25],
    ["3", "cat", "ungroomed", 10],
]

df = pd.DataFrame(data, columns=["ID", "pet", "status", "amount"])

Function

from typing import List

def long_to_wide_reshape_w_agg(
    input_df: pd.DataFrame,
    index_list: List[str],
    col_to_pivot: str,
    vals: str,
    suffix: str,
    aggs: List = np.mean,
    agg_method: str = "mean",
):

    # identify possible values for the column we want to pivot on
    # we need these to filter out columns we want to rename in later steps
    cols = input_df[col_to_pivot].unique().tolist()
    str_cols = [x for x in cols if isinstance(x, str)]

    reshaped_df = input_df.pivot_table(
        index=index_list,
        columns=col_to_pivot,
        aggfunc=aggs,
        values=vals,
    ).reset_index()

    # flatten hierarchical index
    reshaped_df.columns = [" ".join(col).strip() for col in reshaped_df.columns.values]

    # identify columns to rename
    cols_to_rename = [
        s for s in reshaped_df.columns.values if any(subs in s for subs in str_cols)
    ]
    tuple_cols = tuple(cols_to_rename)

    # rename columns as needed
    reshaped_df = reshaped_df.rename(
        columns=lambda col: f"{col}{suffix}" if col in tuple_cols else col
    )
    # remove spaces and replace with underscores
    reshaped_df.columns = [cols.replace(" ", "_") for cols in reshaped_df.columns]

    # based on agg_method chosen, filter to ensure there are no null values for those columns
    col1, col2 = [col for col in reshaped_df.columns if agg_method in col]

   print(col1)
   print(col2)

   ### do stuff with col1/col2

    return reshaped_df

Function Use Case

long_to_wide_reshape(
    input_df=df,
    index_list=["ID", "pet"],
    col_to_pivot="status",
    aggs=[np.median],
    vals="amount",
    suffix="_amount",
    agg_method="median",
)

CodePudding user response:

How about using .__name__?

...

col1, col2 = [col for col in reshaped_df.columns if aggs[0].__name__ in col]

...

Because...

>>> np.median
<function numpy.median(a, axis=None, out=None, overwrite_input=False, keepdims=False)>

>>> np.median.__name__
'median'
  • Related