Home > Blockchain >  Python Filtering Excel Data with pandas by partial match
Python Filtering Excel Data with pandas by partial match

Time:11-27

I am trying to filter an excel spreadsheet using pandas and save the filtered data to a new worksheet. Currently I have this which works as expected

import pandas as pd

df = pd.read_excel('sample.xlsx', sheet_name=0) #reads the first sheet of your excel file

df = df[(df['CodedCorporation'] == 'lucy')] #Filtering dataframe

df.to_excel('sample.xlsx', sheet_name='new data') #Saving to a new sheet called Filtered Data

The problem is that this code looks for an exact match of the word "Lucy" in the column titled "CodedCorporation". What I need is to be able to filter based on partial matches not exact matches.

So I tried this

import pandas as pd

df = pd.read_excel('sample.xlsx', sheet_name=0) #reads the first sheet of your excel file

df = df[('lucy' in df['CodedCorporation'])] #Filtering dataframe

df.to_excel('sample.xlsx', sheet_name='new data') #Saving to a new sheet called Filtered Data

But that spits out a error.

Can someone help explain how to make this look for partial matches in the column and not a exact match? Kind of like how when you are working in excel and you type a word into the filter, excel shows everything that contains that word.

CodePudding user response:

You can use the apply method with a lambda to run row-wise logic.

Instead of this:

df = df[('lucy' in df['CodedCorporation'])] #Filtering dataframe

You can create a flag like "lucy_ind":

df["lucy_ind"] = df.apply(lambda x: True if 'lucy' in x['CodedCorporation'] else False, axis=1)

... and then to filter it (only include lines where lucy is in CodedCorporation)... you'd just do:

df = df[df["lucy_ind"]]

CodePudding user response:

You can use contains() function from str property as documented here


import pandas as pd

df = pd.DataFrame(
    [
        "has lucy",
         "also has lucy in it",
         "this line doesn't"
    ],
    columns=["CodedCoporation"]
)

filtered_df = df[df.CodedCoporation.str.contains('lucy')]

filtered_df.to_markdown()
CodedCoporation
0 has lucy
1 also has lucy in it
  • Related