Home > Blockchain >  I want to filter by specific column name on Pandas
I want to filter by specific column name on Pandas

Time:05-17

I have a pandas dataframe where each column is a tuple when I do pd.dataframe().columns

[('5/12/2022', 'A', 'N', '001'),
 ('5/12/2022', 'B', 'N', '001'),
 ('5/12/2022', 'C', 'N', '002'),
 ('5/12/2022', 'E', 'N', '002'),
 ('5/12/2022', 'A', 'N', '002')] 

I want to filter such that I only get the columns where the second entry of the tuple is == 'A'. So that when I do pd.dataframe().columns, I get

[('5/12/2022', 'A', 'N', '001'),
 ('5/12/2022', 'A', 'N', '002'),....] as my columns.

I want to do it without manually inputting the columns because the dataframe has thousands of columns.

CodePudding user response:

A python list comprehension would tackle it:

select_columns = [col for col in df.columns if col[1] == 'A']
cut_down_df = df[select_columns]

CodePudding user response:

Assuming you really have tuples and not a MultiIndex, you can use the str locator

df.loc[:, df.columns.str[1] == 'A']

example input:

idx = [('5/12/2022', 'A', 'N', '001'),
       ('5/12/2022', 'B', 'N', '001'),
       ('5/12/2022', 'C', 'N', '002'),
       ('5/12/2022', 'E', 'N', '002'),
       ('5/12/2022', 'A', 'N', '002')]

df = pd.DataFrame(columns=idx, index=[0])

output:

  (5/12/2022, A, N, 001) (5/12/2022, A, N, 002)
0                    NaN                    NaN
  • Related