I have this current lambda function: df["domain_count"] = df.apply(lambda row : df['domain'].value_counts()[row['domain']], axis = 1)
But I want to convert it to a regular function like this def get_domain_count()
how do I do this? I'm not sure what parameters it would take in as I want to apply it to an entire column in a dataframe? The domain
column will contain duplicates and I want to know how many times a domain appears in my dataframe.
ex start df:
|domain|
---
|target.com|
|macys.com|
|target.com|
|walmart.com|
|walmart.com|
|target.com|
ex end df:
|domain|count|
---|---|
|target.com|3
|macys.com|1
|target.com|3
|walmart.com|2
|walmart.com|2
|target.com|3
Please help! Thanks in advance!
CodePudding user response:
You can pass the column name as a string, and the dataframe object to mutate:
def countify(frame, col_name):
frame[f"{col_name}_count"] = frame.apply(lambda row: df[col_name]...)
But better yet, you don't need to apply!
df["domain"].map(df["domain"].value_counts())
will first get the counts per unique value, and map each value in the column with that. So the function could become:
def countify(frame, col_name):
frame[f"{col_name}_count"] = frame[col_name].map(frame[col_name].value_counts())
CodePudding user response:
A lambda is just an anonymous function and its usually easy to put it into a function using the lambda's own parameter list (in this case, row
) and returning its expression. The challenge with this one is the df
parameter that will resolve differently in a module level function than in your lambda. So, add that as a parameter to the function. The problem is that this will not be
def get_domain_count(df, row):
return df['domain'].value_counts()[row['domain']]
This can be a problem if you still want to use this function in an .apply
operation. .apply
wouldn't know to add that df
parameter at the front. To solve that, you could create a partial.
import functools.partial
def do_stuff(some_df):
some_df.apply(functools.partial(get_domain_count, some_df))