Trouble rearranging columns with same name in pandas dataframe in Python-CodePudding

I have a pandas dataframe whose columns I have turned into a list and edited and rearranged. I'm trying to reassign the columns like as follows:

sapie_columns = sapie_df_working.columns.tolist()
sapie_columns = [sapie_columns[-1]]   sapie_columns[3:-1]

sapie_df_working = sapie_df_working[sapie_columns]

but it turns my dataframe (initially with 32 columns) into one with 164 columns. I think this is because a number of the existing columns have the same column name (i.e., "90% CI Lower Bound"). I'm curious about why is is happening and how I can rearrange and edit my dataframe's columns as I want to.

For reference, here is a snippet of my dataframe:

# sapie_df_working

2   State FIPS Code County FIPS Code    Postal Code Name    Poverty Estimate, All Ages  90% CI Lower Bound  90% CI Upper Bound  Poverty Percent, All Ages   90% CI Lower Bound  90% CI Upper Bound  ... 90% CI Upper Bound  Median Household Income 90% CI Lower Bound  90% CI Upper Bound  Poverty Estimate, Age 0-4   90% CI Lower Bound  90% CI Upper Bound  Poverty Percent, Age 0-4    90% CI Lower Bound  90% CI Upper Bound
3   00  000 US  United States   38371394    38309115    38433673    11.9    11.9    11.9    ... 14.9    67340   67251   67429   3146325 3133736 3158914 16.8    16.7    16.9
4   01  000 AL  Alabama 714568  695249  733887  14.9    14.5    15.3    ... 20.7    53958   53013   54903   66169   61541   70797   23.3    21.7    24.9
5   01  001 AL  Autauga County  6242    4930    7554    11.2    8.8 13.6    ... 19.3    67565   59132   75998   .   .   .   .   .   .
6   01  003 AL  Baldwin County  20189   15535   24843   8.9 6.8 11  ... 16.1    71135   66540   75730   .   .   .   .   .   .
7   01  005 AL  Barbour County  5548    4210    6886    25.5    19.3    31.7    ... 47.2    38866   33510   44222   .   .   .   .   .   .

CodePudding user response：

df = df[specific_column_names] is indeed producing this result because of duplicate column names. Filtering with column names in this case is tricky, as it's unclear exactly which column is being referred to.

In case of duplicate column names I would instead use column indices to filter the DataFrame.

Let's look at an example:

>>> import pandas as pd
>>> mock_data = [[11.29, 33.1283, -1.219, -33.11, 930.1, 33.91, 0.1213, 0.134], [9.0, 99.101, 9381.0, -940.11, 55.41, -941.1, -1.3913, 1933.1], [-192.1, 0.123, 0.1243, 0.213, 751.1, 991.1, -1.333, 9481.1]]
>>> mock_columns = ['a', 'b', 'c', 'a', 'd', 'b', 'g', 'a']
>>> df = pd.DataFrame(columns=mock_columns, data=mock_data)
>>> df
        a        b          c        a       d       b       g         a
0   11.29  33.1283    -1.2190  -33.110  930.10   33.91  0.1213     0.134
1    9.00  99.1010  9381.0000 -940.110   55.41 -941.10 -1.3913  1933.100
2 -192.10   0.1230     0.1243    0.213  751.10  991.10 -1.3330  9481.100

>>> columns = df.columns.tolist()
>>> filtered_column_indices = [len(columns) - 1]   list(range(3, len(columns) - 1))
>>> df.iloc[:, filtered_column_indices]
          a        a       d       b       g
0     0.134  -33.110  930.10   33.91  0.1213
1  1933.100 -940.110   55.41 -941.10 -1.3913
2  9481.100    0.213  751.10  991.10 -1.3330

In the example, instead of extracting column names with [sapie_columns[-1]] sapie_columns[3:-1], I extracted the equivalent indices and used that to filter the DataFrame using iloc.