Reading and appending excel files to create DataFrame:
import pandas as pd
import os
folder = r'C:\mypathtodocuments'
files = os.listdir(folder)
df = pd.DataFrame()
for file in files:
if file.endswith('.xlsx'):
df = df.append(pd.read_excel(os.path.join(folder,file)))
#Drop extra columns from wrong data
df1 = df[['FIRST_NM', 'LAST_NM', 'CITY_AD']]
Preview of CITY_AD
column:
>>> df1["CITY_AD"]
0 EL PASO
1 HOUSTON
2 HOUSTON
3 CONROE
4 MCKINNEY
5 MCKINNEY
6 KATY
7 TOMBALL
8 TOMBALL
9 SPRING
10 SPRING
Filter DataFrame with .isin()
function to only include cities HOUSTON
and CONROE
:
df1[df1["CITY_AD"].isin(["HOUSTON","CONROE"])]
This returns an empty set... How can I get it to filter correctly?
CodePudding user response:
Try this:
df1["CITY_AD"] = df1["CITY_AD"].str.strip()
df1[df1["CITY_AD"].isin(["HOUSTON","CONROE"])]