New dataframe based on column headers-CodePudding

I have a large dataframe with 600 columns

Approx 40 of these will have the word 'austria' in them. If I'm making a new dataframe just for austrian data is there an easy way to create a new data frame based on the column header?

Any help much appreciated, Thanks

CodePudding user response：

You can use filter:

df2 = df.filter(regex='(?i)austria')  # (?i) makes the search case insensitive

Example:

df = pd.DataFrame(columns=['austria something', 'something austria',
                           'another austria', 'unrelated', 'Austria again'],
                  index=[0])

df.filter(regex='(?i)austria')

output:

  austria something something austria another austria Austria again
0               NaN               NaN             NaN           NaN

CodePudding user response：

Another way using .loc which allows you to filter with booleans across a certain index and .str.contains

df2 = df.loc[:,df.columns.str.contains('austria',case=False)]

  austria something something austria another austria Austria again
0               NaN               NaN             NaN           NaN