I have a multi-index dataframe like this:
TC_name Year
id id2
1 1 RITA 2020
2 RITA 2020
2 1 IDA 2020
2 IDA 2020
3 IDA 2020
4 IDA 2021
3 1 RITA 2021
2 RITA 2021
3 RITA 2021
Now, I want to access the first line for each ‘id’ group, i.e. (1,1) = RITA2020, (2,1) = IDA2020, (3,1) = RITA2021...and use them to form a new dataframe.
However, when I try df.loc[:,1]
, it does not work. I tried df.loc[1]
, df.loc[2]
and it gives me the right group, but it seems that the 'id2' index
can not work well.
So what should I do next to get access to the data I want?
Thank you for your help.
CodePudding user response:
Assuming OP wants to create a dataframe based on the first element of each group, one can use pandas.DataFrame.groupby
. As OP wants the first index, id
, one should be level=0
. Finally, considering that OP wants the first element for each group, then one needs to pass .first()
df2 = df.groupby(level=0).first()
[Out]:
TC_name Year
id
1 RITA 2020
2 IDA 2020
3 RITA 2021