I am new to Pandas. Below is part of a code. I am trying to use the df_filtered which is the filtered data having codenum column value =AB123. However from Line 14 if I use df_filtered instead of excel_data_df , it's not giving results. The desired columns are getting picked correctly. But the value filtering is not happening - In codenum col value should be = AB123. But the value filtering is not happening and I am getting the entire excel converted to json with chosen columns. Please help understand how to consider/use df_filtered data from Line 14.
PathLink = os.path.join(path, 'test' '.json') #Path Name Formation
excel_data_df = pandas.read_excel('input.xlsx',
sheet_name='input_sheet1',
usecols=[4,5,6,18,19], #index starts from 0
names=['codenum', 'name',
'class',
'school',
'city'],
dtype={'codenum': str,
'name': str,
'school': str,
'city': str}) # Excel Read and Column Filtering
df_filtered = excel_data_df.loc[lambda x: x['codenum'] == 'AB123'] # Row Filtering -- need to use this further
excel_data_df.columns = ['Code', 'Student Name', 'Class', 'School Name','City Name'] #renaming columns -- Line Num 14
cols_to_keep = ['Student Name', 'Class', 'School Name','City Name'] # columns to keep
excel_data_df = excel_data_df[cols_to_keep] # columns to keep
excel_data_df # columns to keep
json_str = excel_data_df.to_json(PathLink,orient='records',indent=2) #json converted file
CodePudding user response:
First, a small tip; you can remove the use/need of lambda by doing
df_filtered = excel_data_df.loc[excel_data_df["codenum"]=="AB123"]
if you want to get rid of the lambda
.
Afterwards, as pointed out in the comments, make sure that it contains samples after the filtering;
df_filtered = excel_data_df.loc[excel_data_df["codenum"]=="AB123"]
if df_filtered.shape[0]: #it contains samples
cols_to_keep = ['Student Name', 'Class', 'School Name','City Name'] # columns to keep
excel_data_df = excel_data_df[cols_to_keep] # columns to keep
excel_data_df # columns to keep
json_str = excel_data_df.to_json(PathLink,orient='records',indent=2) #json converted file
else: #it does not contain any samples i.e empty dataframe
print("Filtered data does not contain data")
CodePudding user response:
Try the following code below:
df_filtered = excel_data_df[excel_data_df['codenum'] == 'AB123']
If it still not working then "codenum" may not have this value that you are trying to filter out.
CodePudding user response:
Thanks all for your inputs. Initially it was returning empty dataframe as suggested in above answers and comments. Posting the edited working code based on your inputs for anyone's future reference.
PathLink = os.path.join(path, 'test' '.json') #Path Name Formation
excel_data_df = pandas.read_excel('input.xlsx',
sheet_name='input_sheet1',
usecols=[3,5,6,18,19], #index starts from 0 ## edit 1: corrected index to right column index
names=['codenum', 'name',
'class',
'school',
'city'],
dtype={'codenum': str,
'name': str,
'school': str,
'city': str}) # Excel Read and Column Filtering
print(excel_data_df['codenum'].unique()) ##edit 1: returns unique values including AB123
df_filtered = excel_data_df.loc[excel_data_df["codenum"]=='AB123'] # Row Filtering ##edit 1
print(df_filtered) ##edit 1 - to check if expected results are present in filtered data
df_filtered.columns = ['Code', 'Student Name', 'Class', 'School Name','City Name'] #renaming columns
if df_filtered.shape[0]: #it contains samples ## edit 1
cols_to_keep = ['Student Name', 'Class', 'School Name','City Name'] # columns to keep
df_filtered = df_filtered[cols_to_keep] # columns to keep
df_filtered # columns to keep
json_str = df_filtered.to_json(PathLink,orient='records',indent=2) #json converted file
else: #it does not contain any samples i.e empty dataframe ##edit 1
print("Filtered data does not contain data")
CodePudding user response:
The pandas df.loc will return the filtered result.
In your code, you tried to make a filtered but df.loc is not a filter maker.
See the example, the df.loc return the filter result from the origin df.
import pandas as pd
df = pd.DataFrame([[1, "AB123"], [4, "BC123"], [7, "CD123"]],columns=['A', 'B'])
print(df)
# A B
#0 1 AB123
#1 4 BC123
#2 7 CD123
print(df.loc[lambda x: x["B"] == "AB123"])
# A B
#0 1 AB123