New python user here. I'm trying to figure out how to split a dataframe into 2 excel files, filtered based on whether 2 of the columns contain a specific set of words ("blue" or "mozzarella"). If either of the 2 columns DO contain any of the words, I want that entire row added to one excel file. If they DO NOT contain the words, I want that entire row added to a different excel file.
>>> data
cheese 1 cheese 2 cheese 3
0 blue mozzarella camembert
1 mozzarella munster edam
2 maccagno mozzarella ricotta
3 brie berkswell parmigiano
I've searched around and attempted the below, but it gives me an invalid index error for data.at[row, 'cheese 1'].
writer = pd.ExcelWriter('./Included cheese.xlsx', engine='xlsxwriter')
writer_2 = pd.ExcelWriter('./Excluded cheese.xlsx', engine='xlsxwriter')
pattern = 'blue|mozzarella'
for index, row in data.iterrows():
if pattern in data.at[row, 'cheese 1']|data.at[row, 'cheese 2']:
data.at[row:].to_excel(writer)
else:
data.at[row:].to_excel(writer_2)
How should I fix this to achieve my desired filtered outputs?
CodePudding user response:
If I read the question correctly, you can filter the overall dataframe into two dataframes, one with the pattern and one without. Then write the separate dataframes to excel. Using the | you can write an or statement to filter the data by checking if the pattern is in either column.
pattern_df = data[(data['cheese 1'].str.contains(pattern)) | (data['cheese 2'].str.contains(pattern))]
then you can use ~ to check for rows without the pattern, as well as change up the logic of the filter with & to find all rows without the pattern in col 1 or col 2.
non_pattern_df = data[~(data['cheese 1'].str.contains(pattern)) & ~(data['cheese 2'].str.contains(pattern))]
Hope this helps.
CodePudding user response:
Not very familiar with Pandas and how to deal with xlsx file, but just try with the code below and this works for me:
import pandas as pd
data = pd.read_excel("test.xlsx")
columns = data.columns.values
check_set = {"blue","mozzarella"}
condational = data[columns[0]].isin(check_set) | data[columns[1]].isin(check_set)
data_in = data[condational]
data_out = data[~condational]
data_in.to_excel("Included_cheese.xlsx",index=False)
data_out.to_excel("Excluded_cheese.xlsx",index=False)
At first, I tried to write a xlsx file as in your code
writer = pd.ExcelWriter('Included_cheese.xlsx', engine='xlsxwriter')
writer_2 = pd.ExcelWriter('Excluded_cheese.xlsx', engine='xlsxwriter')
data_in.to_excel(writer)
data_out.to_excel(writer_2)
but I can not open with the written file with excel in this way.