Home > Software design >  conditional flagging in pandas
conditional flagging in pandas

Time:01-18

I have a dataframe df :-

ID 1F_col 2F_col 3F_col 4G_col
1 0 1 1 1
2 0 1 0 0
3 1 1 0 1
4 0 0 0 1
5 0 0 0 0
6 1 1 1 1

I have 2 types of column names , one which has F and another which has G in them .

F_type & G_Type : If atleast one of the column names with F has 1, I want to flag 1 , and likewise if atleast one of the G columns are 1 ,then I want to flag 1(F_type & G_Type are the column names). Like below :

Comm1 : If both F_type and G_Type is 1 I want to show string Good If F_type is 1 and G_Type is 0 then string 4G If F_type is 0 and G_Type is 1 then string 1F If both F_type and G_Type is 0 then string 1F

Comm2 : If both F_type and G_Type is 0 then string Hard, else Good if Comm1 is Good, else Soft

ID F_Type G_Type Comm1 Comm2
1 1 1 Good Good
2 1 0 4G soft
3 1 1 Good Good
4 0 1 1F Soft
5 0 0 1F Hard
6 1 1 Good Good

I have a hug records of ID ( 1 million ) what would be the best way to achieve this in less time?

CodePudding user response:

If you are working with a larger dataset. One of the approach is to introduce numpy.

import numpy as np

df['F_Type'] = df[['1F_col', '2F_col', '3F_col']].any(axis=1).astype(int)
df['G_Type'] = df[['4G_col']].any(axis=1).astype(int)

df['Comm1'] = np.where(df['F_Type'] & df['G_Type'], 'Good', 
                        np.where(df['F_Type'], '4G', '1F'))

df['Comm2'] = np.where(np.logical_and(df['F_Type'] == 0, df['G_Type'] == 0), 
                        'Hard', np.where(df['Comm1'] == 'Good', 'Good', 'Soft'))

print(df)

   ID  1F_col  2F_col  3F_col  4G_col  F_Type  G_Type Comm1 Comm2
0   1       0       1       1       1       1       1  Good  Good
1   2       0       1       0       0       1       0    4G  Soft
2   3       1       1       0       1       1       1  Good  Good
3   4       0       0       0       1       0       1    1F  Soft
4   5       0       0       0       0       0       0    1F  Hard
5   6       1       1       1       1       1       1  Good  Good
df = df.drop(columns=['1F_col', '2F_col', '3F_col', '4G_col'])
print(df)

   ID  F_Type  G_Type Comm1 Comm2
0   1       1       1  Good  Good
1   2       1       0    4G  Soft
2   3       1       1  Good  Good
3   4       0       1    1F  Soft
4   5       0       0    1F  Hard
5   6       1       1  Good  Good

CodePudding user response:

df_2 = pd.DataFrame(df.ID)
df_2["F_Type"] = (df.loc[:,df.columns.str.match("[0-9]*F_.*")]==1).any(1)
df_2["G_Type"] = (df.loc[:,df.columns.str.match("[0-9]*G_.*")]==1).any(1)

#Comm1
df_2.loc[df_2.F_Type & df_2.G_Type, "Comm1"] = "Good"
df_2.loc[df_2.F_Type & (~df_2.G_Type), "Comm1"] = "4G"
df_2.loc[~df_2.F_Type, "Comm1"] = "1F"

#Comm2
df_2["Comm2"] = df_2.Comm1
df_2.loc[(~df_2[["F_Type","G_Type"]]).all(1), "Comm2"] = "Hard"
df_2.loc[df_2[["F_Type","G_Type"]].sum(1)==1, "Comm2"] = "Soft"

#converting F_type and G_Type in int type
df_2[["F_Type","G_Type"]] = df_2[["F_Type","G_Type"]].astype(int)
df_2

CodePudding user response:

Here is another way:

d = {(1,1):['Good','Good'],
(1,0):['4G','soft'],
(0,1):['1F','soft'],
(0,0):['1F','hard']}

df1 = df.set_index('ID').groupby(lambda x: x[1],axis=1).any().astype(int)

df2 = pd.DataFrame(pd.MultiIndex.from_frame(df1).map(d).tolist(),columns = ['Comm1','Comm2'])

final_df = pd.concat([df1.reset_index(),df2],axis=1)

Ouput:

   ID  F  G Comm1 Comm2
0   1  1  1  Good  Good
1   2  1  0    4G  soft
2   3  1  1  Good  Good
3   4  0  1    1F  soft
4   5  0  0    1F  hard
5   6  1  1  Good  Good
  • Related