Home > OS >  For each value in a column, how to count the number of unique values in its row?
For each value in a column, how to count the number of unique values in its row?

Time:03-09

Suppose I have a data frame:

ID       person_1     person_2
ID_001   Aaron        Ben
ID_003   Kate         Ben
ID_001   Aaron        Lou
ID_005   Lee          Ben
ID_006   Aaron        Cassie
ID_001   Tim          Ben
ID_003   Ben          Mal

For every ID in the column "ID", I want to count the number of unique names that were associated with the ID

My desired output:

ID       Count
ID_001   4
ID_003   3
ID_005   2
ID_006   2

CodePudding user response:

Flat your columns person1 and person2 then remove duplicated names and finally count unique value per ID:

out = df.melt('ID').drop_duplicates(['ID', 'value']) \
        .value_counts('ID').rename('Count').reset_index()
print(out)

# Output
       ID  Count
0  ID_001      4
1  ID_003      3
2  ID_005      2
3  ID_006      2

CodePudding user response:

Use df.melt('ID').groupby('ID')['value'].nunique().

>>> df.melt('ID').groupby('ID')['value'].nunique()
ID
ID_001    4
ID_003    3
ID_005    2
ID_006    2
Name: value, dtype: int64

edit: df.set_index('ID').stack().groupby(level=0).nunique() works too.

CodePudding user response:

Melt your dataframe together, drop duplicates, then group by ID and aggregate over the count of the variables. At least, rename the column variable to Count.

df.melt(["ID"]).drop_duplicates(["ID","value"]).groupby(
    ["ID"]).agg({"variable":"count"}).reset_index().rename(
    columns={"variable":"Count"})

   ID          Count
   ID_001      4
   ID_003      3
   ID_005      2
   ID_006      2
  • Related