I am trying to filter an excel spreadsheet using pandas and save the filtered data to a new worksheet. Currently I have this which works as expected
import pandas as pd
df = pd.read_excel('sample.xlsx', sheet_name=0) #reads the first sheet of your excel file
df = df[(df['CodedCorporation'] == 'lucy')] #Filtering dataframe
df.to_excel('sample.xlsx', sheet_name='new data') #Saving to a new sheet called Filtered Data
The problem is that this code looks for an exact match of the word "Lucy" in the column titled "CodedCorporation". What I need is to be able to filter based on partial matches not exact matches.
So I tried this
import pandas as pd
df = pd.read_excel('sample.xlsx', sheet_name=0) #reads the first sheet of your excel file
df = df[('lucy' in df['CodedCorporation'])] #Filtering dataframe
df.to_excel('sample.xlsx', sheet_name='new data') #Saving to a new sheet called Filtered Data
But that spits out a error.
Can someone help explain how to make this look for partial matches in the column and not a exact match? Kind of like how when you are working in excel and you type a word into the filter, excel shows everything that contains that word.
CodePudding user response:
You can use the apply method with a lambda to run row-wise logic.
Instead of this:
df = df[('lucy' in df['CodedCorporation'])] #Filtering dataframe
You can create a flag like "lucy_ind":
df["lucy_ind"] = df.apply(lambda x: True if 'lucy' in x['CodedCorporation'] else False, axis=1)
... and then to filter it (only include lines where lucy is in CodedCorporation)... you'd just do:
df = df[df["lucy_ind"]]
CodePudding user response:
You can use contains()
function from str
property as documented here
import pandas as pd
df = pd.DataFrame(
[
"has lucy",
"also has lucy in it",
"this line doesn't"
],
columns=["CodedCoporation"]
)
filtered_df = df[df.CodedCoporation.str.contains('lucy')]
filtered_df.to_markdown()
CodedCoporation | |
---|---|
0 | has lucy |
1 | also has lucy in it |