I have this dataframe with multiple headers
name, 00590BL, 01090BL, 01100MS, 02200MS
lat, 613297, 626278, 626323, 616720
long, 5185127, 5188418, 5188431, 5181393
elv, 1833, 1915, 1915, 1499
1956-01-01, 1, 2, 2, -2
1956-01-02, 2, 3, 3, -1
1956-01-03, 3, 4, 4, 0
1956-01-04, 4, 5, 5, 1
1956-01-05, 5, 6, 6, 2
I read this as
dfr = pd.read_csv(f_name,
skiprows = 0,
header = [0,1,2,3],
index_col = 0,
parse_dates = True
)
I would like to extract the value related the rows named 'lat' and 'long'. A easy way, could be to read the dataframe in two step. In other words, the idea could be have two dataframes. I do not like this because it is not very elegant and it not seems to take advantage of pandas potentiality. I believe that I could use some feature related to multi-index.
what do you think?
CodePudding user response:
One way to do this is to use the .loc
method to select the rows by their label. For example, you could use the following code to extract the 'lat' values:
lat_values = dfr.loc['lat']
And similarly, you could use the following code to extract the 'long' values:
long_values = dfr.loc['long']
Alternatively, you can use the .xs
method to extract the values of the desired level.
lat_values = dfr.xs('lat', level=1, axis=0) long_values = dfr.xs('long', level=1, axis=0)
Both these approach will extract the values for 'lat' and 'long' rows from the dataframe and will allow you to access it as one dataframe with one index.
CodePudding user response:
You can use get_level_values
:
dfr = pd.read_csv(f_name, skiprows=0, header=[0, 1, 2, 3], index_col=0,
parse_dates=[0], skipinitialspace=True)
lat = df.columns.get_level_values('lat').astype(int)
long = df.columns.get_level_values('long').astype(int)
elv = df.columns.get_level_values('elv').astype(int)
Output:
>>> lat.to_list()
[613297, 626278, 626323, 616720]
>>> long.to_list()
[5185127, 5188418, 5188431, 5181393]
>>> elv.to_list()
[1833, 1915, 1915, 1499]
If you only need the first row of column header, use droplevel
df = dfr.droplevel(['lat', 'long', 'elv'], axis=1).rename_axis(columns=None))
print(df)
# Output
00590BL 01090BL 01100MS 02200MS
1956-01-01 1 2 2 -2
1956-01-02 2 3 3 -1
1956-01-03 3 4 4 0
1956-01-04 4 5 5 1
1956-01-05 5 6 6 2