I tried to select a single value of column class from each group of my dataframe after i performed the groupby function on the column first_register and second_register but it seems did not work.
Suppose I have a dataframe like this:
import numpy as np
import pandas as pd
df = pd.DataFrame({'class': [1, 1, 1, 2, 2, 2, 0, 0, 1],
'first_register': ["70/20", "70/20", "70/20", "71/20", "71/20", "71/20", np.NAN, np.NAN, np.NAN],
'second_register': [np.NAN, np.NAN, np.NAN, np.NAN, np.NAN, np.NAN, "72/20", "72/20", "73/20"]})
What I have tried and did not work at all:
group_by_df = df.groupby(["first_register", "second_register"])
label_class = group_by_df["class"].unique()
print(label_class)
How can I select/access each single class label from each group of dataframe?
The desired output can be an ordered list like this to represent each class of each group from the first group to the final group:
label_class = [1, 2, 0, 1]
CodePudding user response:
Use dropna=False
:
group_by_df = df.groupby(["first_register", "second_register"], dropna=False)
label_class = group_by_df["class"].unique()
first_register second_register
70/20 NaN [1]
71/20 NaN [2]
NaN 72/20 [0]
73/20 [1]
Name: class, dtype: object
CodePudding user response:
Use GroupBy.first
:
out = df.groupby(["first_register", "second_register"], dropna=False)["class"].first()
print (out)
first_register second_register
70/20 NaN 1
71/20 NaN 2
NaN 72/20 0
73/20 1
Name: class, dtype: int64
label_class = out.tolist()
print (label_class)
[1, 2, 0, 1]