I'm having difficulty applying my knowledge of defining functions with def to my own function.
I want to create a function where I can filter my data frame based on my 1. columns I'd like to drop their axis 2. using .dropna
I've used it on one of my data frames like this :
total_adj_gross = ((gross.drop(columns = ['genre','rating', 'total_gross'], axis = 1)).dropna())
I've also used it on another data frame like this :
vill = (characters.drop(columns = ['hero','song'], axis = 1)).dropna(axis = 0)
Can I make a function using def so I can easily do this to any data frame?
if so would I go about it like this
def filtered_df(data, col_name, N=1):
frame = data.drop(columns = [col_name], axis = N)
frame.dropna(axis = N)
return frame
I can already feel like this above function would go wrong because what if I have different N's like in my vill object?
BTW I am a very new beginner as you can tell -- I haven't been exposed to any complex functions any help would be appreciated!
Update since I dont know how to make a code in comments:
Thank you all for your help in creating my function but now how do I insert this in my code?
Do I have to make a script (.py) then call my function?
can I test in within my actual code?
right now if I just copy paste any code in, and fill the column name I get an error saying the specific column code "is not found in the axis"
CodePudding user response:
def filtered_df(df, drop_cols):
return df.drop(columns = drop_cols, axis=1).dropna(axis=0)
CodePudding user response:
Based on what you want to achieve, you don't need to pass any axis parameter. Also, you want to pass a list of columns as a parameter to drop the different columns (axis=1 for drop()
and axis=0 for dropna()
, which is the default parameter value). And finally, dropna()
is not in place by default. You have to store the returned value into a frame like you did the line above.
Your function should look like that:
def filtered_df(data, col_names):
frame = data.drop(columns = col_names, axis = 1)
result = frame.dropna()
return result
CodePudding user response:
Overall, code looks good. I'd suggest 3 minor changes:-
- Pass columns names as list. Do not convert them to list within the functions
- Pass 2 variables for working with axis. From what i see in your eg, your axis values changes for drop and dropna. Not sure about your need for it. If you want 2 diff axis values for
drop()
anddropna()
then please use 2 diff variables, probably likedrop_axis
anddropna_axis
. - assigning modified frame / single line operation
So, code would look something like this:-
def filtered_df(data, col_name, drop_axis=1, dropna_axis=0):
frame = data.drop(columns = col_name, axis = drop_axis).dropna(axis = dropna_axis)
return frame
Your call to it can look like:
modified_df = filtered_df(data, ["x_col","y_col"], 0, 0)