I have this example, dataset:
A B
11 A
11 V
11 C
12 A
12 A
12 A
12 A
13 A
13 A
13 B
13 B
And I want it to return, the grouped A values, that has different values on the B column. So in this example, the expected output is:
[11, 13]
I made an attempt at formulating the code, and I succeeded but it is terrible and unoptimized. And I was looking for alternatives so I could iterate faster through my much bigger dataset. Would appreciate some help.
Here is my code:
user_mult_camps = []
for i in df['A'].unique():
filt = (df['A'] == i)
df2 = df.loc[filt]
x=df2['B'].unique()
if len(x) > 1:
user_mult_camps.append(i)
print(i)
CodePudding user response:
You could groupby
"A" and use nunique
to count the number of unique "B"s per "A". Then evaluate if it's greater than 1 to filter the "A"s that have more than one corresponding "B":
msk = df.groupby('A')['B'].nunique()>1
out = msk.index[msk].tolist()
Output:
[11]
CodePudding user response:
Try this:
out = df.groupby('A')['B'].nunique().pipe(lambda x: x[x > 1].index.to_numpy())
Output:
>>> out
array([11])
>>> out[0]
11
CodePudding user response:
IIUC, you want to ensure all values are different?
You could use:
s = df.groupby('A')['B'].apply(lambda s: ~s.duplicated().any())
s[s].index.tolist()
Output: [11]