Home > Back-end >  pandas read dataframe multi-header values
pandas read dataframe multi-header values

Time:01-22

I have this dataframe with multiple headers

name,   00590BL,    01090BL,    01100MS,    02200MS
lat,    613297, 626278, 626323, 616720
long,   5185127,    5188418,    5188431,    5181393
elv,    1833,   1915,   1915,   1499
1956-01-01, 1,  2,  2,  -2
1956-01-02, 2,  3,  3,  -1
1956-01-03, 3,  4,  4,  0
1956-01-04, 4,  5,  5,  1
1956-01-05, 5,  6,  6,  2

I read this as

dfr      =  pd.read_csv(f_name,
                            skiprows     = 0,
                            header       = [0,1,2,3], 
                            index_col    = 0,
                            parse_dates  = True
                            )

I would like to extract the value related the rows named 'lat' and 'long'. A easy way, could be to read the dataframe in two step. In other words, the idea could be have two dataframes. I do not like this because it is not very elegant and it not seems to take advantage of pandas potentiality. I believe that I could use some feature related to multi-index.

what do you think?

CodePudding user response:

One way to do this is to use the .loc method to select the rows by their label. For example, you could use the following code to extract the 'lat' values:

lat_values = dfr.loc['lat']

And similarly, you could use the following code to extract the 'long' values:

long_values = dfr.loc['long']

Alternatively, you can use the .xs method to extract the values of the desired level.

lat_values = dfr.xs('lat', level=1, axis=0) long_values = dfr.xs('long', level=1, axis=0)

Both these approach will extract the values for 'lat' and 'long' rows from the dataframe and will allow you to access it as one dataframe with one index.

CodePudding user response:

You can use get_level_values:

dfr = pd.read_csv(f_name, skiprows=0, header=[0, 1, 2, 3], index_col=0, 
                  parse_dates=[0], skipinitialspace=True)

lat = df.columns.get_level_values('lat').astype(int)
long = df.columns.get_level_values('long').astype(int)
elv = df.columns.get_level_values('elv').astype(int)

Output:

>>> lat.to_list()
[613297, 626278, 626323, 616720]

>>> long.to_list()
[5185127, 5188418, 5188431, 5181393]

>>> elv.to_list()
[1833, 1915, 1915, 1499]

If you only need the first row of column header, use droplevel

df = dfr.droplevel(['lat', 'long', 'elv'], axis=1).rename_axis(columns=None))
print(df)

# Output
            00590BL  01090BL  01100MS  02200MS
1956-01-01        1        2        2       -2
1956-01-02        2        3        3       -1
1956-01-03        3        4        4        0
1956-01-04        4        5        5        1
1956-01-05        5        6        6        2
  • Related