Let's say I have a dataframe as follows:
Group | Source | Name
___________________________
A | X | Jolly
A | X | Stone
A | X | Jolly
A | Y | Sand
B | X | Sand
B | X | Stone
B | Y | Stone
C | X | Sand
C | X | Stone
I want to find all Group
s where each group of Source
s share no common Name
s. Essentially in the example above, I want Group
A
as all groupings of Source
(X
and Y
) Name
s share no common values. For this example we can assume there will only be 2 Source
s (X
and Y
) and not all Group
s have more than 1 Source
. I am only interested in Group
s with both Source
s X
and Y
and no Name
intersection.
The resulting DataFrame should look like this:
Group | Source | Name
___________________________
A | X | Jolly
A | X | Stone
A | X | Jolly
A | Y | Sand
I have tried, doing a grouby
on Group
then supplied a function to the chained filter
method like so:
def find_no_intersection(df):
return (
len(df[df.Source == 'X'].Name.values) > 0 and
len(df[df.Source == 'Y'].Name.values) > 0 and
(
len(
set(df[df.Source == 'X'].Name.values) &
set(df[df.Source == 'Y'].Name.values)
) == 0
)
)
df.groupby(['Group']).filter(find_no_intersection)
Is this the right way? Is there a better way?
CodePudding user response:
Here is a way using nunique()
df.loc[df.groupby('Group')['Name'].transform(lambda x: x.size == x.nunique())]
Output:
Group Source Name
0 A X Jolly
1 A X Stone
2 A Y Sand
Update to answer:
(df.loc[
df['Group'].map(
df.groupby(['Group','Source'])['Name']
.agg(set)
.groupby(level=0)
.agg(lambda x: len(set.intersection(*x))==0))
])
Output:
Group Source Name
0 A X Jolly
1 A X Stone
2 A X Jolly
3 A Y Sand
CodePudding user response:
If I understand correctly, you can do that with the following.
df[~df['Group'].isin(df[df[['Source','Name']].duplicated()]['Group'])]