I have a pandas dataframe whose columns I have turned into a list and edited and rearranged. I'm trying to reassign the columns like as follows:
sapie_columns = sapie_df_working.columns.tolist()
sapie_columns = [sapie_columns[-1]] sapie_columns[3:-1]
sapie_df_working = sapie_df_working[sapie_columns]
but it turns my dataframe (initially with 32 columns) into one with 164 columns. I think this is because a number of the existing columns have the same column name (i.e., "90% CI Lower Bound"). I'm curious about why is is happening and how I can rearrange and edit my dataframe's columns as I want to.
For reference, here is a snippet of my dataframe:
# sapie_df_working
2 State FIPS Code County FIPS Code Postal Code Name Poverty Estimate, All Ages 90% CI Lower Bound 90% CI Upper Bound Poverty Percent, All Ages 90% CI Lower Bound 90% CI Upper Bound ... 90% CI Upper Bound Median Household Income 90% CI Lower Bound 90% CI Upper Bound Poverty Estimate, Age 0-4 90% CI Lower Bound 90% CI Upper Bound Poverty Percent, Age 0-4 90% CI Lower Bound 90% CI Upper Bound
3 00 000 US United States 38371394 38309115 38433673 11.9 11.9 11.9 ... 14.9 67340 67251 67429 3146325 3133736 3158914 16.8 16.7 16.9
4 01 000 AL Alabama 714568 695249 733887 14.9 14.5 15.3 ... 20.7 53958 53013 54903 66169 61541 70797 23.3 21.7 24.9
5 01 001 AL Autauga County 6242 4930 7554 11.2 8.8 13.6 ... 19.3 67565 59132 75998 . . . . . .
6 01 003 AL Baldwin County 20189 15535 24843 8.9 6.8 11 ... 16.1 71135 66540 75730 . . . . . .
7 01 005 AL Barbour County 5548 4210 6886 25.5 19.3 31.7 ... 47.2 38866 33510 44222 . . . . . .
CodePudding user response:
df = df[specific_column_names]
is indeed producing this result because of duplicate column names. Filtering with column names in this case is tricky, as it's unclear exactly which column is being referred to.
In case of duplicate column names I would instead use column indices to filter the DataFrame.
Let's look at an example:
>>> import pandas as pd
>>> mock_data = [[11.29, 33.1283, -1.219, -33.11, 930.1, 33.91, 0.1213, 0.134], [9.0, 99.101, 9381.0, -940.11, 55.41, -941.1, -1.3913, 1933.1], [-192.1, 0.123, 0.1243, 0.213, 751.1, 991.1, -1.333, 9481.1]]
>>> mock_columns = ['a', 'b', 'c', 'a', 'd', 'b', 'g', 'a']
>>> df = pd.DataFrame(columns=mock_columns, data=mock_data)
>>> df
a b c a d b g a
0 11.29 33.1283 -1.2190 -33.110 930.10 33.91 0.1213 0.134
1 9.00 99.1010 9381.0000 -940.110 55.41 -941.10 -1.3913 1933.100
2 -192.10 0.1230 0.1243 0.213 751.10 991.10 -1.3330 9481.100
>>> columns = df.columns.tolist()
>>> filtered_column_indices = [len(columns) - 1] list(range(3, len(columns) - 1))
>>> df.iloc[:, filtered_column_indices]
a a d b g
0 0.134 -33.110 930.10 33.91 0.1213
1 1933.100 -940.110 55.41 -941.10 -1.3913
2 9481.100 0.213 751.10 991.10 -1.3330
In the example, instead of extracting column names with [sapie_columns[-1]] sapie_columns[3:-1]
, I extracted the equivalent indices and used that to filter the DataFrame using iloc
.