first i have a df, when i groupby it with a column, will it remove duplicate values?. Second, how to know which group have duplicate values ( i tried to find how to know which columns of a df have duplicate values but couldn't find anything, they just talk about how each element duplicated or not)
ex i have a dfgroup like this:
B C
1 2 3
1 4 3
2 2 2
2 3 4
2 2 3
and result i want after find which group and column duplicated:
B C
1 False True
2 True False
i tried find a way like this df.groupby(A).agg(find_duplicate)
with A
is column is groupby, thanks for help
CodePudding user response:
You could use a lambda function inside GroupBy.agg
to compare number of unique values that is not equal to the number of values in a group. To get the number of unique we can use Series.nunique
and Series.size
for the number of values in a group.
df.groupby(level=0).agg(lambda x: x.size!=x.nunique())
# B C
# 1 False True
# 2 True False
CodePudding user response:
Let us try
out = df.groupby(level=0).agg(lambda x : x.duplicated().any())
B C
1 False True
2 True False