Home > Software engineering >  select a single value from a column after groupby another columns in python
select a single value from a column after groupby another columns in python

Time:03-23

I tried to select a single value of column class from each group of my dataframe after i performed the groupby function on the column first_register and second_register but it seems did not work.

Suppose I have a dataframe like this:

import numpy as np
import pandas as pd
df = pd.DataFrame({'class': [1, 1, 1, 2, 2, 2, 0, 0, 1],
                   'first_register': ["70/20", "70/20", "70/20", "71/20", "71/20", "71/20", np.NAN, np.NAN, np.NAN],
                   'second_register': [np.NAN, np.NAN, np.NAN, np.NAN, np.NAN, np.NAN, "72/20", "72/20", "73/20"]})

What I have tried and did not work at all:

group_by_df = df.groupby(["first_register", "second_register"])
label_class = group_by_df["class"].unique()
print(label_class)

How can I select/access each single class label from each group of dataframe?

The desired output can be an ordered list like this to represent each class of each group from the first group to the final group:

label_class = [1, 2, 0, 1]

CodePudding user response:

Use dropna=False:

group_by_df = df.groupby(["first_register", "second_register"], dropna=False)
label_class = group_by_df["class"].unique()


first_register  second_register
70/20           NaN                [1]
71/20           NaN                [2]
NaN             72/20              [0]
                73/20              [1]
Name: class, dtype: object

CodePudding user response:

Use GroupBy.first:

out = df.groupby(["first_register", "second_register"], dropna=False)["class"].first()
print (out)

first_register  second_register
70/20           NaN                1
71/20           NaN                2
NaN             72/20              0
                73/20              1
Name: class, dtype: int64


label_class = out.tolist()
print (label_class)
[1, 2, 0, 1]
  • Related