How to use the filtered data in Pandas?-CodePudding

I am new to Pandas. Below is part of a code. I am trying to use the df_filtered which is the filtered data having codenum column value =AB123. However from Line 14 if I use df_filtered instead of excel_data_df , it's not giving results. The desired columns are getting picked correctly. But the value filtering is not happening - In codenum col value should be = AB123. But the value filtering is not happening and I am getting the entire excel converted to json with chosen columns. Please help understand how to consider/use df_filtered data from Line 14.

PathLink = os.path.join(path, 'test'   '.json') #Path Name Formation 
        excel_data_df = pandas.read_excel('input.xlsx',
                                          sheet_name='input_sheet1',
                                          usecols=[4,5,6,18,19], #index starts from 0
                                          names=['codenum', 'name',
                                                  'class',
                                                  'school',
                                                  'city'],
                                           dtype={'codenum': str,
                                                  'name': str,
                                                  'school': str,
                                                  'city': str})  # Excel Read and Column Filtering
        df_filtered = excel_data_df.loc[lambda x: x['codenum'] == 'AB123']    # Row Filtering   -- need to use this further     
        excel_data_df.columns = ['Code', 'Student Name', 'Class', 'School Name','City Name'] #renaming columns  -- Line Num 14
        cols_to_keep = ['Student Name', 'Class', 'School Name','City Name'] # columns to keep
        excel_data_df = excel_data_df[cols_to_keep]  # columns to keep
        excel_data_df                                # columns to keep
        json_str = excel_data_df.to_json(PathLink,orient='records',indent=2) #json converted file

CodePudding user response：

First, a small tip; you can remove the use/need of lambda by doing

df_filtered = excel_data_df.loc[excel_data_df["codenum"]=="AB123"]

if you want to get rid of the lambda.

Afterwards, as pointed out in the comments, make sure that it contains samples after the filtering;

df_filtered = excel_data_df.loc[excel_data_df["codenum"]=="AB123"]

if df_filtered.shape[0]: #it contains samples
        cols_to_keep = ['Student Name', 'Class', 'School Name','City Name'] # columns to keep
        excel_data_df = excel_data_df[cols_to_keep]  # columns to keep
        excel_data_df                                # columns to keep
        json_str = excel_data_df.to_json(PathLink,orient='records',indent=2) #json converted file
else: #it does not contain any samples i.e empty dataframe
   print("Filtered data does not contain data")

CodePudding user response：

Try the following code below:

df_filtered = excel_data_df[excel_data_df['codenum'] == 'AB123']

If it still not working then "codenum" may not have this value that you are trying to filter out.

CodePudding user response：

Thanks all for your inputs. Initially it was returning empty dataframe as suggested in above answers and comments. Posting the edited working code based on your inputs for anyone's future reference.

PathLink = os.path.join(path, 'test'   '.json') #Path Name Formation 

        excel_data_df = pandas.read_excel('input.xlsx',
                                          sheet_name='input_sheet1',
                                          usecols=[3,5,6,18,19], #index starts from 0 ##  edit 1: corrected index to right column index
                                          names=['codenum', 'name', 
                                                  'class',
                                                  'school',
                                                  'city'],
                                           dtype={'codenum': str,
                                                  'name': str,
                                                  'school': str,
                                                  'city': str})  # Excel Read and Column Filtering
                                                  
        print(excel_data_df['codenum'].unique())  ##edit 1: returns unique values including AB123
        
        df_filtered = excel_data_df.loc[excel_data_df["codenum"]=='AB123']    # Row Filtering   ##edit 1     
        print(df_filtered)  ##edit 1 - to check if expected results are present in filtered data
        
        df_filtered.columns = ['Code', 'Student Name', 'Class', 'School Name','City Name'] #renaming columns  
        
        if df_filtered.shape[0]: #it contains samples ## edit 1 
            cols_to_keep = ['Student Name', 'Class', 'School Name','City Name'] # columns to keep
            df_filtered = df_filtered[cols_to_keep]  # columns to keep
            df_filtered                                # columns to keep
            json_str = df_filtered.to_json(PathLink,orient='records',indent=2) #json converted file
        else: #it does not contain any samples i.e empty dataframe ##edit 1 
            print("Filtered data does not contain data")

CodePudding user response：

The pandas df.loc will return the filtered result.

In your code, you tried to make a filtered but df.loc is not a filter maker.

See the example, the df.loc return the filter result from the origin df.

import pandas as pd

df = pd.DataFrame([[1, "AB123"], [4, "BC123"], [7, "CD123"]],columns=['A', 'B'])

print(df)
#   A      B
#0  1  AB123
#1  4  BC123
#2  7  CD123

print(df.loc[lambda x: x["B"] == "AB123"])
#   A      B
#0  1  AB123