Home > database >  How to check pair of string values in a column, after grouping the dataframe using ID column?
How to check pair of string values in a column, after grouping the dataframe using ID column?

Time:09-02

My Doubt in a Table/Dataframe viewI have a dataframe containing 2 columns: ID and Code.

ID Code Flag
1   A   0
1   C   1
1   B   1
2   A   0
2   B   1
3   A   0
4   C   0

Within each ID, if Code 'A' exists with 'B' or 'C', then it should flag 1.

I tried Groupby('ID') with filter(). but it is not showing the perfect result. Could anyone please help ?

CodePudding user response:

You can do the following:

First use pd.groupby('ID') and concatenate the codes using 'sum' to create a new column. Then assing the value 1 if a row contains A or B as Code and when the new column contains an A:

df['s'] = df.groupby('ID').Code.transform('sum')
df['Flag'] = 0
df.loc[((df.Code == 'B') | (df.Code == 'C')) & df.s.str.contains('A'), 'Flag'] = 1
df = df.drop(columns = 's')

Output:

   ID Code  Flag
0   1    A     0
1   1    C     1
2   1    B     1
3   2    A     0
4   2    B     1
5   3    A     0
6   4    C     0

CodePudding user response:

You can use boolean masks, direct for B/C, per group for A, then combine them and convert to integer:

# is the Code a B or C?
m1 = df['Code'].isin(['B', 'C'])
# is there also a A in the same group?
m2 = df['Code'].eq('A').groupby(df['ID']).transform('any')

# if both are True, flag 1
df['Flag'] = (m1&m2).astype(int)

Output:

   ID Code  Flag
0   1    A     0
1   1    C     1
2   1    B     1
3   2    A     0
4   2    B     1
5   3    A     0
6   4    C     0
  • Related