I have a dataframe in Python which (simplified), looks similar to this:
Type | Market | Price
-------------------------
1 | A | 2
1 | B | 2
1 | B | 2
-------------------------
2 | A | 4
2 | C | 4
2 | C | 4
2 | B | 8
-------------------------
3 | A | 8
3 | B | 7
3 | B | 7
3 | C | 7
(for better clarity I divided up the dataframe based on the different Types).
What I would like to do is subset the dataframe such that, if Market for each type only has "A" and "B" (and not "C") then I want to keep it. So for example, from the dataframe above, since Type "1" only has "A" and "B" in Market, I want to keep it, but since Type "2" has "A" and "C" and then "B", then I don't want to keep it. On the other hand, since Type "3" has "A", then "B", then "C", then I want to keep it. So from this dataframe, I want to keep Type "1" and Type "3".
I'm having a bit of trouble implementing this, since it requires very specific conditions, and I'm not very good at programming unfortunately. What is a good way of doing this? Thanks in advance :)
CodePudding user response:
df.loc[df.groupby('Type').Market.transform(lambda x : set(x) == {'A', 'B'} or all(x==sorted(x)))]
Type Market Price
0 1 A 2
1 1 B 2
2 1 B 2
7 3 A 8
8 3 B 7
9 3 B 7
10 3 C 7
CodePudding user response:
try this:
df.groupby('Type').filter(lambda g: ''.join(g.Market.unique()[:2]) == 'AB')
>>>
Type Market Price
0 1 A 2
1 1 B 2
2 1 B 2
7 3 A 8
8 3 B 7
9 3 B 7
10 3 C 7
CodePudding user response:
If need keep only unique values A,B
or A,B,C
in order use:
s = df.drop_duplicates(['Type','Market']).groupby('Type')['Market'].agg(tuple)
df = df[df['Type'].isin(s.index[s.isin([('A','B'),('A','B','C')])])]
print (df)
Type Market Price
0 1 A 2
1 1 B 2
2 1 B 2
7 3 A 8
8 3 B 7
9 3 B 7
10 3 C 7
Another idea:
def f(x):
u = tuple(dict.fromkeys(x))
return (u == ('A','B')) | (u == ('A','B','C'))
df = df[df.groupby('Type').Market.transform(f)]