I am currently trying to filter one column in a CSV file by multiple values. For instance, I would like all of the data in the CSV file that is related to departments 104, 105, 106, etc. I have the code below, so far which works for each department individually. However, is there something more dynamic that can utilize the departments array? I was thinking of using a for loop but my attempts have thrown me errors so far. I know that the simple solution would be to use a series of or statements, however, I need to filter by at least 10-15 departments and this seems less efficient than it could be.
import pandas as pd
report = "file.csv"
df = pd.read_csv(report, low_memory=False)
departments = [104, 105, 106, 107]
df = df[(df['Department'] == departments[0])]
df.to_csv('file_filtered.csv')
CodePudding user response:
you can use loc
in combination with the isin
to filter on all the departments
df=df.loc[df['Department'].isin(departments)]
df.to_csv('file_filtered.csv')
CodePudding user response:
Alternatively you can combine in
with query
.
Check out this example:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'departments': np.arange(100,110),
'foo': np.arange(0,10),
'bar': np.arange(10,0,-1)})
print(df)
departments = [104, 105, 106, 107]
print(df.query('departments in @departments'))