Home > Net >  Pandas Dataframe Filtering with optional conditions
Pandas Dataframe Filtering with optional conditions

Time:09-16

I have a question about filltering pandas dataframe by optionals as below:

Request: in case 'brand'=='All" or 'lo' == 'All', will exclude from filtering. Issue: I try to process as below but doesn't work. Please help me to fix code.

def filter_for_company(df_source, season, brand, lo):
    mask = (
               (df_source['Production Priority'] == 'Primary')
            &  (df_source['Season'] == season)
            &  (if brand =='All':
                    pass
                else:
                    df_source['Brand'] == brand
                )
            &  (if lo ='All':
                    pass
                else:    
                    df_source['Liaison Office Code'] == lo
                )             
    )            
    company_df = df_source.loc[mask,:]
    return company_df

CodePudding user response:

Your if-else statement within the mask definition is not giving you the required boolean condition you want. You can try modifying the codes to properly set the boolean conditions, as follows:

def filter_for_company(df_source, season, brand, lo):
    mask = (
               (df_source['Production Priority'] == 'Primary')
            &  (df_source['Season'] == season)
            &  (brand !='All')
            &  (lo !='All')
    )            

    if brand !='All':
        df_source['Brand'] == brand

    if lo !='All':
        df_source['Liaison Office Code'] == lo
               
    company_df = df_source.loc[mask,:]
    return company_df

Note that this may not yet be the best optimized codes to achieve your purpose. However, without the overall picture of the dataframe and sample data, it is hard to further optimize it.

CodePudding user response:

The conditional slicing for pandas dataframes using loc or iloc are designed for filtering based on values actually present in the dataframe. What you want to achieve should be handled separately by the code which calls your filtering function.
However, if you do not have a control over that, you can modify your function as below:

def filter_for_company(df_source, season, brand, lo):
    mask = (
        (df_source["Production Priority"] == "Primary") 
        & (df_source["season"] == season)
    )

    if brand != "All":
        df_source = df_source[df_source["Brand"] == brand]

    if lo != "All":
        df_source = df_source[df_source["Liaison Office Code"] == lo]

    company_df = df_source.loc[mask, :]
    return company_df
  • Related