Home > Blockchain >  How to Filter CSV File using Pandas by Multiple Values
How to Filter CSV File using Pandas by Multiple Values

Time:10-21

I am currently trying to filter one column in a CSV file by multiple values. For instance, I would like all of the data in the CSV file that is related to departments 104, 105, 106, etc. I have the code below, so far which works for each department individually. However, is there something more dynamic that can utilize the departments array? I was thinking of using a for loop but my attempts have thrown me errors so far. I know that the simple solution would be to use a series of or statements, however, I need to filter by at least 10-15 departments and this seems less efficient than it could be.

import pandas as pd
report = "file.csv"

df = pd.read_csv(report, low_memory=False)
departments = [104, 105, 106, 107]

df = df[(df['Department'] == departments[0])]
df.to_csv('file_filtered.csv')

CodePudding user response:

you can use loc in combination with the isin to filter on all the departments


df=df.loc[df['Department'].isin(departments)]
df.to_csv('file_filtered.csv')

CodePudding user response:

Alternatively you can combine in with query.

Check out this example:

import pandas as pd
import numpy as np
df = pd.DataFrame({
    'departments': np.arange(100,110),
    'foo': np.arange(0,10),
    'bar': np.arange(10,0,-1)})
print(df)

departments = [104, 105, 106, 107]
print(df.query('departments in @departments'))
  • Related