Suppose I have a data frame:
ID person_1 person_2
ID_001 Aaron Ben
ID_003 Kate Ben
ID_001 Aaron Lou
ID_005 Lee Ben
ID_006 Aaron Cassie
ID_001 Tim Ben
ID_003 Ben Mal
For every ID in the column "ID", I want to count the number of unique names that were associated with the ID
My desired output:
ID Count
ID_001 4
ID_003 3
ID_005 2
ID_006 2
CodePudding user response:
Flat your columns person1
and person2
then remove duplicated names and finally count unique value per ID
:
out = df.melt('ID').drop_duplicates(['ID', 'value']) \
.value_counts('ID').rename('Count').reset_index()
print(out)
# Output
ID Count
0 ID_001 4
1 ID_003 3
2 ID_005 2
3 ID_006 2
CodePudding user response:
Use df.melt('ID').groupby('ID')['value'].nunique()
.
>>> df.melt('ID').groupby('ID')['value'].nunique()
ID
ID_001 4
ID_003 3
ID_005 2
ID_006 2
Name: value, dtype: int64
edit: df.set_index('ID').stack().groupby(level=0).nunique()
works too.
CodePudding user response:
Melt
your dataframe together, drop duplicates
, then group by ID
and aggregate
over the count
of the variables. At least, rename
the column variable to Count
.
df.melt(["ID"]).drop_duplicates(["ID","value"]).groupby(
["ID"]).agg({"variable":"count"}).reset_index().rename(
columns={"variable":"Count"})
ID Count
ID_001 4
ID_003 3
ID_005 2
ID_006 2