I have a data frame df which looks like this:
year | date | Time | observation1 | observation2 |
---|---|---|---|---|
2012 | 11-02 | 9:12:00 | 79.373668 | 224 |
2012 | 11-02 | 9:13:00 | 130.841316 | 477 |
2012 | 11-05 | 9:14:00 | 45.312814 | 835 |
2013 | 11-05 | 9:15:00 | 123.776946 | 623 |
2013 | 11-05 | 9:16:00 | 79.373668 | 224 |
2013 | 11-22 | 9:17:00 | 130.841316 | 477 |
2013 | 11-22 | 9:18:00 | 45.312814 | 835 |
2014 | 11-01 | 9:19:00 | 123.776946 | 623 |
I would like to use year and date as indexes so that I can retrive rows from a specific year or rows from a specific date given a year. For example, I would like to get data from year 2012. Also, I would like to get data from 2013-11-05. I probably need to use multiindex, but how can I set the multiindex with a dataframe like this?
CodePudding user response:
To make multiindex, use set_index()
with list of column names
df = df.set_index(['year','date'])
Time observation1 observation2
year date
2012 11-02 9:12:00 79.373668 224
11-02 9:13:00 130.841316 477
11-05 9:14:00 45.312814 835
2013 11-05 9:15:00 123.776946 623
11-05 9:16:00 79.373668 224
11-22 9:17:00 130.841316 477
11-22 9:18:00 45.312814 835
2014 11-01 9:19:00 123.776946 623
But to make the selections you need, the following operations are enough without forming a multiindex
print(df[df.year.eq(2012)])
print(df[df.year.eq(2013) & df.date.eq('11-05')])
year date Time observation1 observation2
0 2012 11-02 9:12:00 79.373668 224
1 2012 11-02 9:13:00 130.841316 477
2 2012 11-05 9:14:00 45.312814 835
year date Time observation1 observation2
3 2013 11-05 9:15:00 123.776946 623
4 2013 11-05 9:16:00 79.373668 224