Home > Back-end >  Groupby A column and bring up the A value, only if the B values differ from the other ones
Groupby A column and bring up the A value, only if the B values differ from the other ones

Time:02-10

I have this example, dataset:

 A   B 
11   A
11   V
11   C
12   A
12   A
12   A
12   A
13   A
13   A
13   B
13   B

And I want it to return, the grouped A values, that has different values on the B column. So in this example, the expected output is:

[11, 13]

I made an attempt at formulating the code, and I succeeded but it is terrible and unoptimized. And I was looking for alternatives so I could iterate faster through my much bigger dataset. Would appreciate some help.

Here is my code:

user_mult_camps = []

for i in  df['A'].unique():
    filt = (df['A'] == i)
    df2 = df.loc[filt]
    x=df2['B'].unique()
    if len(x) > 1:
        user_mult_camps.append(i)
        print(i)

CodePudding user response:

You could groupby "A" and use nunique to count the number of unique "B"s per "A". Then evaluate if it's greater than 1 to filter the "A"s that have more than one corresponding "B":

msk = df.groupby('A')['B'].nunique()>1
out = msk.index[msk].tolist()

Output:

[11]

CodePudding user response:

Try this:

out = df.groupby('A')['B'].nunique().pipe(lambda x: x[x > 1].index.to_numpy())

Output:

>>> out
array([11])

>>> out[0]
11

CodePudding user response:

IIUC, you want to ensure all values are different?

You could use:

s = df.groupby('A')['B'].apply(lambda s: ~s.duplicated().any())

s[s].index.tolist()

Output: [11]

  • Related