select columns based on columns names-CodePudding

I need to get columns that match the list of names provided. The probles is, the csv file columns might contain extra spaces, and since the csv file is dynamically obtained from api, its not make sense to fix the csv manually.

 fields = ['Assembled Weights', 'Assembled Heights']

 dataframe = pd.read_excel(file)
 df = dataframe[fields]

The problem in csv is some of its column names contains extra spaces for example

     Assembled Weights -> single space
     Assembled  Heights -> contains two spaces

So I got error KeyError: "['Assembled Heights'] not in index"

How do you ignore the extra spaces from dataframe column names?

CodePudding user response：

Remove the extra spaces with str.replace:

out = df.loc[:, df.columns.str.replace(r'\s ', ' ', regex=True).isin(fields)]

NB. This leaves the original names unchanged.

If you also want to fix the names:

df.columns = df.columns.str.replace(r'\s ', ' ', regex=True)
out = df.loc[:, df.columns.isin(fields)]

# or
out = df[fields]