I need to subset rows of df based on two columns (c1 and c2 columns) which have strings. I need to be able to return rows where one value in c1 is associated with only 2 different values in c2. col4-6 are irrelevant for subsetting and just need to be returned.
Code to recreate df
df = pd.DataFrame({"": [0,1,2,3,4,5,6,7,8],
"c1": ["ABC", "ABC", "dfg", "dfg", "dfg","dfg","ghj","ghj","ghj"],
"c2": ["delta", "delta", "alpha", "bravo", "alpha","bravo","bravo","delta","alpha"],
"c3": [1, 2, 2, 3, 5,6,3,3,3],
"col4": [786, 787, 777, 775, 767,715,772,712,712],
"col5": [10, 11, 13, 12, 13,12,14,12,12],
"col6": [1,2,4, 3, 4,3, 5, 8,8]})
df
c1 c2 c3 col4 col5 col6
0 ABC delta 1 786 10 1
1 ABC delta 2 787 11 2
2 dfg alpha 2 777 13 4
3 dfg bravo 3 775 12 3
4 dfg alpha 5 767 13 4
5 dfg bravo 6 715 12 3
6 ghj bravo 3 772 14 5
7 ghj delta 3 712 12 8
8 ghj alpha 3 712 12 8
Answer df should be:
finaldf
c1 c2 c3 col4 col5 col6
2 dfg alpha 2 777 13 4
3 dfg bravo 3 775 12 3
4 dfg alpha 5 767 13 4
5 dfg bravo 6 715 12 3
What if the rows where one value in c1 is associated with 2 and 3 different values in c2 is of interest like in df below?
finaldf
c1 c2 c3 col4 col5 col6
2 dfg alpha 2 777 13 4
3 dfg bravo 3 775 12 3
4 dfg alpha 5 767 13 4
5 dfg bravo 6 715 12 3
6 ghj bravo 3 772 14 5
7 ghj delta 3 712 12 8
8 ghj alpha 3 712 12 8
I think some kind of groupby and transform operation could help achieve this.
CodePudding user response:
Get the index where your condition is true:
cond = (df.groupby("c1")["c2"]
.nunique().where(lambda x: x == 2)
.dropna().index)
and then use .loc
df.loc[df["c1"].isin(cond)]
CodePudding user response:
here is one way to do it
#using transform, return a boolean on unique values matching the count
# and filter the dataframe
df[df.groupby('c1')['c2'].transform(lambda x: x.nunique()==2)]
c1 c2 c3 col4 col5 col6
2 2 dfg alpha 2 777 13 4
3 3 dfg bravo 3 775 12 3
4 4 dfg alpha 5 767 13 4
5 5 dfg bravo 6 715 12 3
df[df.groupby('c1')['c2'].transform(lambda x: x.nunique()>=2)]
c1 c2 c3 col4 col5 col6
2 2 dfg alpha 2 777 13 4
3 3 dfg bravo 3 775 12 3
4 4 dfg alpha 5 767 13 4
5 5 dfg bravo 6 715 12 3
6 6 ghj bravo 3 772 14 5
7 7 ghj delta 3 712 12 8
8 8 ghj alpha 3 712 12 8