If I wanted to create a new df that only had rows from the original df that fall into specified categories, what would be the most efficient way to do that?
df = sns.load_dataset('diamonds')
def makenewdf(cuts=['Ideal','Premium'], df=df):
[some kind of loop to dynamically filter df based on the values of cuts]
what would be the best way to make this function such that I could provide the categories I want to sequester?
ex: makenewdf(cuts = ['Good'])
would return a df containing only rows where the cut was 'Good' and makenewdf(cuts = ['Good','Ideal','Premium'])
would return a df with only rows containing one of the three values in cuts
CodePudding user response:
You're searching for the isin()
function, you can use something like this:
def makenewdf(cuts, df):
return df[df.cut.isin(cuts)]
# Example
print(makenewdf(['Good'], df))
# Example
print(makenewdf(['Good','Ideal','Premium'], df))
CodePudding user response:
Like this: filtered_df = df[df['cuts'].isin(['Ideal', 'Premium'])]