I'm trying to filter a large sheet through Python, I read the .xlsx into a new dataframe, filter rows that have a specific value, then output each matching row into a new dataframe.
However, my issue is I would like to append a new column that simply contains the "old" row numbers from the origin sheet. I'm not really sure how to accomplish this efficiently.
Can someone point me in the right direction?
import pandas as pd
df = pd.read_excel(path, sheet_name='Sheet 1')
matching_term_list = ['TRUE', 'FALSE']
column_name = 'True/False/NA'
df_filtered_command = df[df[col_name_type].isin(matching_term_list)]
df_filtered_command.columns = column_headers
df_filtered_command.head()
with pd.ExcelWriter('new_sheet.xlsx') as writer:
df_filtered_command.to_excel(writer, sheet_name="Command", index=False)
CodePudding user response:
When you filter the dataframe, you can then reset the index and not drop the old one, which will add a new column "index". I the renamed it to be "Old Index":
df_filtered_command = df[df[col_name_type].isin(matching_term_list)].reset_index(drop=False).rename(columns={"index": "Old Index"})
If this is out by 1 or so for the actual Excel row (because of a header, etc.), you can then use:
df_filtered_command["Old Index"] = 1