I'm trying to access a column in a pandas dataframe in python.
Here is the dataframe I am using. I only printed the first row in order to try to keep it a manageable size. I would have selected a smaller number of columns but you will see that this is my problem and the reason I'm asking this question.
{'OilSampleID': {0: 'TGSM3059'}, 'Type': {0: 'Oil Seep'}, 'Country': {0: 'Mexico'}, 'USGS Province': {0: 'Ocean'}, 'State / Province': {0: 'GOM'}, 'County / Block': {0: nan}, 'Block': {0: 'TGS Area 2'}, 'Section': {0: '11'}, 'Well': {0: 'VER115'}, 'Upper Depth (ft)': {0: nan}, 'Lower Depth (ft)': {0: nan}, 'Standardized Reservoir Age': {0: nan}, 'Reservoir Age': {0: nan}, 'Standardized Reservoir Formation': {0: nan}, 'Reservoir Formation': {0: nan}, 'Lithology': {0: nan}, 'Operator': {0: 'TDI-Brooks'}, 'Location': {0: nan}, 'Latitude': {0: 21.712965}, 'Longitude': {0: -96.280545}, 'Datum': {0: 'WGS84 UTM15N'}, 'API / UWI Well #': {0: nan}, 'Comments': {0: 'Piston Core Sediment'}, 'SampleID': {0: nan}, 'ClientID': {0: nan}, 'API Gravity': {0: nan}, '<C15': {0: nan}, '% S': {0: nan}, 'ppm Ni': {0: nan}, 'ppm V': {0: nan}, '% Sat': {0: 10.3448275862249}, '% Aro': {0: 13.7931034482896}, '% NSO': {0: 48.2758620689676}, '% Asph': {0: 27.5862068965179}, 'Sat / Aro': {0: 0.75}, '13Cs': {0: nan}, '13Ca': {0: -27.51}, '13Cwc': {0: nan}, '34Swc': {0: nan}, 'CV': {0: nan}, 'EOM': {0: 193.0}, 'Misc': {0: 'TSF = 150,000'}, 'Misc.1': {0: 150000.0}, 'SRType': {0: 'Marine Carbonate/Marl'}, 'SRTconf': {0: 'suspected'}, 'SRAge': {0: 'UJ/LK'}, 'SRAconf': {0: 'suspected'}, 'SRM': {0: nan}, 'BIOD': {0: 'Mild 4'}, 'BIOD_': {0: 4.0}, 'GM REF': {0: nan}, 'GM Fam': {0: nan}, 'C19/C23': {0: 0.018}, 'C22/C21': {0: 1.176}, 'C24/C23': {0: 0.405}, 'C26/C25': {0: 0.672}, 'Tet/C23': {0: 0.471}, 'C27T/C27': {0: 0.015}, 'C28/H': {0: 0.029}, 'C29/H': {0: 0.988}, 'X/H': {0: 0.013}, 'OL/H': {0: 0.009}, 'C31R/H': {0: 0.408}, 'GA/C31R': {0: 0.06}, 'C35S/C34S': {0: 1.131}, 'S/H': {0: 0.242}, '% C27': {0: 23.975}, '% C28': {0: 28.596}, '% C29': {0: 47.429}, 'S1/S6': {0: 0.887}, 'C29 20S/R': {0: 0.63}, 'C29 bbS/aaR': {0: 1.06}, 'C27 Ts/Tm': {0: 0.382}, 'C29 Ts/Tm': {0: 0.1}, 'DM/H': {0: 0.012}, 'C26(S R) / Ts': {0: 0.316}, 'P': {0: 8.98723829264705}, '3MP': {0: 4.01903139633122}, '2MP': {0: 5.318803252166}, '9MP': {0: 5.38721229720994}, '1MP': {0: 3.85655991435188}, 'C4N': {0: 0.43610766215509}, 'DBT': {0: 0.0}, '4MDBT': {0: 0.256533918914759}, '32MDBT': {0: nan}, '1MDBT': {0: nan}, 'C20TAS': {0: 1.65036821168495}, 'C21TAS': {0: 2.32590753149381}, 'C26STAS': {0: 12.6898778556501}, 'C26RC27STAS': {0: 62.243679859351}, 'C28STAS': {0: 52.8801918189623}, 'C27RTAS': {0: 42.2254830533693}, 'C28RTAS': {0: 46.860195855096}, '#9b': {0: 8.0}, '#3e': {0: 11.0}, 'C18': {0: 0.0}, 'C19': {0: 1.0}, 'C20': {0: 1.0}, 'MPI': {0: 0.768292682926829}, 'F1': {0: 0.50253106304648}, 'F2': {0: 0.286240220892775}, 'P/DBT': {0: nan}, 'DBT/C4N': {0: nan}, 'MDR': {0: nan}, 'TAS1': {0: 0.0376145000974469}, 'TAS2': {0: 0.0472878998609179}, 'TAS3(CR)': {0: 0.0180023228803717}, 'TAS4': {0: 0.239974126778784}, 'TAS5': {0: 0.901094890510949}, 'DINO3/9': {0: 1.3740932642487}, 'C19T': {0: 0.181692648715433}, 'C20T': {0: 0.622946224167199}, 'C21T': {0: 1.62225579210208}, 'C22T': {0: 1.90777281151205}, 'C23T': {0: 9.92820544766473}, 'C24T': {0: 4.02319436441316}, 'C25S': {0: 2.336048340627}, 'C25R': {0: 2.336048340627}, 'TET': {0: 4.67209668125399}, 'C26S': {0: 1.60927774576526}, 'C26R': {0: 1.53140946774436}, 'ET_C28S': {0: 1.03824370694533}, 'ET_C28R': {0: 1.23291440199758}, 'ET_C29S': {0: 2.01159718220658}, 'ET_C29R': {0: 3.02388479647828}, 'ET_C30S': {0: 1.36269486536575}, 'ET_C30R': {0: 5.84012085156749}, 'ET_C31S': {0: 1.14206807763986}, 'ET_C31R': {0: 0.584012085156749}, 'Ts': {0: 9.92820544766473}, 'C27T': {0: 0.545077946146299}, 'Tm': {0: 26.0209829053174}, 'C28DM': {0: 2.40093857231108}, 'C28H': {0: 2.79027996241558}, 'C29DM': {0: 1.12909003130305}, 'C29H': {0: 96.2451916338322}, 'C29D': {0: 9.60375428924431}, 'C30X': {0: 1.29780463368166}, 'OL': {0: 0.882507150903532}, 'C30H': {0: 97.4391718968193}, 'C30M': {0: 8.70826909200397}, 'C31S': {0: 56.6751283528783}, 'C31R': {0: 39.7387778833326}, 'GA': {0: 2.38796052597426}, 'C32S': {0: 27.461546048704}, 'C32R': {0: 18.0135283155015}, 'C33S': {0: 20.6610497682121}, 'C33R': {0: 12.6146610393858}, 'C34S': {0: 12.458924483344}, 'C34R': {0: 7.7219375704059}, 'C35S': {0: 14.0941583217829}, 'C35R': {0: 8.53955448962535}, 'S1': {0: 7.30404447836041}, 'S2': {0: 4.12442312584033}, 'S3': {0: 4.85119372070206}, 'S4': {0: 11.337621279843}, 'S4B': {0: 10.4109887713943}, 'S5': {0: 6.32290417529707}, 'S5B': {0: 7.54024492169047}, 'S6': {0: 8.23067698680911}, 'S7': {0: 2.54369708201606}, 'S8': {0: 3.83371488789564}, 'S9': {0: 6.23205785093935}, 'S9B': {0: 8.394200370653}, 'S10': {0: 7.19502888913115}, 'S10B': {0: 8.99378611141393}, 'S11': {0: 3.67019150405175}, 'S12': {0: 7.4675678622043}, 'S13': {0: 14.0085032159599}, 'S13B': {0: 16.0071223518296}, 'S14': {0: 12.4641157018787}, 'S14B': {0: 14.916966459537}, 'ISTD': {0: 500.0}, 'S15': {0: 11.7918529016316}, 'Biodegraded': {0: nan}, '15': {0: nan}, '16': {0: nan}, '17': {0: nan}, 'Pr': {0: nan}, '18': {0: nan}, 'Ph': {0: nan}, '19': {0: nan}, '20': {0: nan}, '21': {0: nan}, '22': {0: nan}, '23': {0: nan}, '24': {0: nan}, '25': {0: nan}, '26': {0: nan}, '27': {0: nan}, '28': {0: nan}, '29': {0: nan}, '30': {0: nan}, '31': {0: nan}, '32': {0: nan}, '33': {0: nan}, '34': {0: nan}, '35': {0: nan}, 'Pr/Ph': {0: nan}, 'Pr/nC17': {0: nan}, 'Ph/nC18': {0: nan}, 'nC27/nC17': {0: nan}, 'nC19*2/(nC18 nC20)': {0: nan}, 'CPI': {0: nan}, 'S1/S6.1': {0: 0.887}, 'DWW D/R': {0: 1.1739236190043156}, 'DWW Ts/(Ts Tm)': {0: 0.27617328519855566}, 'DWW 35SH/34SH': {0: 1.131}, 'DWW 29H/H': {0: 0.988}, 'DWW OL/H': {0: 0.009}, 'DWW GA/H': {0: 0.02450719232818326}, 'DWW 31H/H': {0: 0.9894778902504008}, 'DWW 29H/31H': {0: 0.9982501009557132}, 'DWW ' St': {0: 23.975}, 'DWW ( St': {0: 28.596}, 'DWW ) St': {0: 47.429}, 'DWW M/H': {0: 0.08937133724027711}, 'DWW St/Ho': {0: 0.242}, 'DWW 29S/R': {0: 0.63}, 'Family': {0: 2}}
I looked on this StackOverflow post to try and find the answer, but I think I'm running into problems because my data has headers with spaces with them.
I followed the other example to try and select just some of the columns. Here is my code to try and select only a few.
geology_data_selection = geology_data[['OilSampleID', 'Type', 'Country', 'USGS Province', 'Well', 'Latitude', 'Longitude', 'EOM', 'Misc...43', 'BIOD', '% Sat', '% Aro', '% NSO', '% Asph']]
I was thinking the problem is that some of these columns like the % Sat
column has a space in it. One of the comments under that SO post told me to use backpacks, so I tried using backpacks but that didn't work either.
geology_data_selection = geology_data[['OilSampleID', 'Type', 'Country', '`USGS Province`', 'Well', 'Latitude', 'Longitude', 'EOM', 'Misc...43', 'BIOD', '`% Sat`', '`% Aro`', '`% NSO`', '`% Asph`']]
How can I select columns from a pandas data frame explicitly when the column names are all that great (some of them with the ... and some of them with the spaces).
CodePudding user response:
If you supply keys that exist... this works just fine.
df[['OilSampleID', 'Type', 'Country', 'USGS Province', 'Well', 'Latitude', 'Longitude', 'EOM', 'BIOD', '% Sat', '% Aro', '% NSO', '% Asph']]
Output:
OilSampleID Type Country USGS Province Well Latitude Longitude EOM BIOD % Sat % Aro % NSO % Asph
0 TGSM3059 Oil Seep Mexico Ocean VER115 21.712965 -96.280545 193.0 Mild 4 10.344828 13.793103 48.275862 27.586207