Home > Net >  Subset a Dataframe in Python based on specific conditions
Subset a Dataframe in Python based on specific conditions

Time:06-09

I have a dataframe in Python which (simplified), looks similar to this:

Type  | Market |  Price
-------------------------
 1    |   A    |    2
 1    |   B    |    2
 1    |   B    |    2
-------------------------
 2    |   A    |    4
 2    |   C    |    4
 2    |   C    |    4
 2    |   B    |    8
-------------------------
 3    |   A    |    8
 3    |   B    |    7
 3    |   B    |    7
 3    |   C    |    7

(for better clarity I divided up the dataframe based on the different Types).

What I would like to do is subset the dataframe such that, if Market for each type only has "A" and "B" (and not "C") then I want to keep it. So for example, from the dataframe above, since Type "1" only has "A" and "B" in Market, I want to keep it, but since Type "2" has "A" and "C" and then "B", then I don't want to keep it. On the other hand, since Type "3" has "A", then "B", then "C", then I want to keep it. So from this dataframe, I want to keep Type "1" and Type "3".

I'm having a bit of trouble implementing this, since it requires very specific conditions, and I'm not very good at programming unfortunately. What is a good way of doing this? Thanks in advance :)

CodePudding user response:

df.loc[df.groupby('Type').Market.transform(lambda x : set(x) == {'A', 'B'} or all(x==sorted(x)))]

    Type Market  Price
0      1      A      2
1      1      B      2
2      1      B      2
7      3      A      8
8      3      B      7
9      3      B      7
10     3      C      7

CodePudding user response:

try this:

df.groupby('Type').filter(lambda g: ''.join(g.Market.unique()[:2]) == 'AB')
>>>

    Type    Market  Price
0   1       A       2
1   1       B       2
2   1       B       2
7   3       A       8
8   3       B       7
9   3       B       7
10  3       C       7

CodePudding user response:

If need keep only unique values A,B or A,B,C in order use:

s = df.drop_duplicates(['Type','Market']).groupby('Type')['Market'].agg(tuple)
df = df[df['Type'].isin(s.index[s.isin([('A','B'),('A','B','C')])])]
print (df)
    Type Market  Price
0      1      A      2
1      1      B      2
2      1      B      2
7      3      A      8
8      3      B      7
9      3      B      7
10     3      C      7

Another idea:

def f(x):
    u = tuple(dict.fromkeys(x))
    return (u  == ('A','B')) | (u  == ('A','B','C'))

df = df[df.groupby('Type').Market.transform(f)]
  • Related