I have a pandas dataframe where each column is a tuple when I do pd.dataframe().columns
[('5/12/2022', 'A', 'N', '001'),
('5/12/2022', 'B', 'N', '001'),
('5/12/2022', 'C', 'N', '002'),
('5/12/2022', 'E', 'N', '002'),
('5/12/2022', 'A', 'N', '002')]
I want to filter such that I only get the columns where the second entry of the tuple is == 'A'. So that when I do pd.dataframe().columns, I get
[('5/12/2022', 'A', 'N', '001'),
('5/12/2022', 'A', 'N', '002'),....] as my columns.
I want to do it without manually inputting the columns because the dataframe has thousands of columns.
CodePudding user response:
A python list comprehension would tackle it:
select_columns = [col for col in df.columns if col[1] == 'A']
cut_down_df = df[select_columns]
CodePudding user response:
Assuming you really have tuples and not a MultiIndex, you can use the str
locator
df.loc[:, df.columns.str[1] == 'A']
example input:
idx = [('5/12/2022', 'A', 'N', '001'),
('5/12/2022', 'B', 'N', '001'),
('5/12/2022', 'C', 'N', '002'),
('5/12/2022', 'E', 'N', '002'),
('5/12/2022', 'A', 'N', '002')]
df = pd.DataFrame(columns=idx, index=[0])
output:
(5/12/2022, A, N, 001) (5/12/2022, A, N, 002)
0 NaN NaN