Home > Blockchain >  convert lambda function to regular function PYTHON df["domain_count"] = df.apply(lambda ro
convert lambda function to regular function PYTHON df["domain_count"] = df.apply(lambda ro

Time:01-26

I have this current lambda function: df["domain_count"] = df.apply(lambda row : df['domain'].value_counts()[row['domain']], axis = 1)

But I want to convert it to a regular function like this def get_domain_count() how do I do this? I'm not sure what parameters it would take in as I want to apply it to an entire column in a dataframe? The domain column will contain duplicates and I want to know how many times a domain appears in my dataframe.

ex start df:

|domain|
---
|target.com|
|macys.com|
|target.com|
|walmart.com|
|walmart.com|
|target.com|

ex end df:
|domain|count|
---|---|
|target.com|3
|macys.com|1
|target.com|3
|walmart.com|2
|walmart.com|2
|target.com|3

Please help! Thanks in advance!

CodePudding user response:

You can pass the column name as a string, and the dataframe object to mutate:

def countify(frame, col_name):
    frame[f"{col_name}_count"] = frame.apply(lambda row: df[col_name]...)

But better yet, you don't need to apply!

df["domain"].map(df["domain"].value_counts())

will first get the counts per unique value, and map each value in the column with that. So the function could become:

def countify(frame, col_name):
    frame[f"{col_name}_count"] = frame[col_name].map(frame[col_name].value_counts())

CodePudding user response:

A lambda is just an anonymous function and its usually easy to put it into a function using the lambda's own parameter list (in this case, row) and returning its expression. The challenge with this one is the df parameter that will resolve differently in a module level function than in your lambda. So, add that as a parameter to the function. The problem is that this will not be

def get_domain_count(df, row): 
    return df['domain'].value_counts()[row['domain']]

This can be a problem if you still want to use this function in an .apply operation. .apply wouldn't know to add that df parameter at the front. To solve that, you could create a partial.

import functools.partial

def do_stuff(some_df):
    some_df.apply(functools.partial(get_domain_count, some_df))
  • Related