I have a dataframe df
:-
ID | 1F_col | 2F_col | 3F_col | 4G_col |
---|---|---|---|---|
1 | 0 | 1 | 1 | 1 |
2 | 0 | 1 | 0 | 0 |
3 | 1 | 1 | 0 | 1 |
4 | 0 | 0 | 0 | 1 |
5 | 0 | 0 | 0 | 0 |
6 | 1 | 1 | 1 | 1 |
I have 2 types of column names , one which has F
and another which has G
in them .
F_type & G_Type : If atleast one of the column names with F
has 1, I want to flag 1 , and likewise if atleast one of the G
columns are 1 ,then I want to flag 1(F_type & G_Type are the column names). Like below :
Comm1 :
If both F_type and G_Type is 1 I want to show string Good
If F_type is 1 and G_Type is 0 then string 4G
If F_type is 0 and G_Type is 1 then string 1F
If both F_type and G_Type is 0 then string 1F
Comm2 : If both F_type and G_Type is 0 then string Hard
, else Good
if Comm1 is Good
, else Soft
ID | F_Type | G_Type | Comm1 | Comm2 |
---|---|---|---|---|
1 | 1 | 1 | Good | Good |
2 | 1 | 0 | 4G | soft |
3 | 1 | 1 | Good | Good |
4 | 0 | 1 | 1F | Soft |
5 | 0 | 0 | 1F | Hard |
6 | 1 | 1 | Good | Good |
I have a hug records of ID
( 1 million ) what would be the best way to achieve this in less time?
CodePudding user response:
If you are working with a larger dataset. One of the approach is to introduce numpy
.
import numpy as np
df['F_Type'] = df[['1F_col', '2F_col', '3F_col']].any(axis=1).astype(int)
df['G_Type'] = df[['4G_col']].any(axis=1).astype(int)
df['Comm1'] = np.where(df['F_Type'] & df['G_Type'], 'Good',
np.where(df['F_Type'], '4G', '1F'))
df['Comm2'] = np.where(np.logical_and(df['F_Type'] == 0, df['G_Type'] == 0),
'Hard', np.where(df['Comm1'] == 'Good', 'Good', 'Soft'))
print(df)
ID 1F_col 2F_col 3F_col 4G_col F_Type G_Type Comm1 Comm2
0 1 0 1 1 1 1 1 Good Good
1 2 0 1 0 0 1 0 4G Soft
2 3 1 1 0 1 1 1 Good Good
3 4 0 0 0 1 0 1 1F Soft
4 5 0 0 0 0 0 0 1F Hard
5 6 1 1 1 1 1 1 Good Good
df = df.drop(columns=['1F_col', '2F_col', '3F_col', '4G_col'])
print(df)
ID F_Type G_Type Comm1 Comm2
0 1 1 1 Good Good
1 2 1 0 4G Soft
2 3 1 1 Good Good
3 4 0 1 1F Soft
4 5 0 0 1F Hard
5 6 1 1 Good Good
CodePudding user response:
df_2 = pd.DataFrame(df.ID)
df_2["F_Type"] = (df.loc[:,df.columns.str.match("[0-9]*F_.*")]==1).any(1)
df_2["G_Type"] = (df.loc[:,df.columns.str.match("[0-9]*G_.*")]==1).any(1)
#Comm1
df_2.loc[df_2.F_Type & df_2.G_Type, "Comm1"] = "Good"
df_2.loc[df_2.F_Type & (~df_2.G_Type), "Comm1"] = "4G"
df_2.loc[~df_2.F_Type, "Comm1"] = "1F"
#Comm2
df_2["Comm2"] = df_2.Comm1
df_2.loc[(~df_2[["F_Type","G_Type"]]).all(1), "Comm2"] = "Hard"
df_2.loc[df_2[["F_Type","G_Type"]].sum(1)==1, "Comm2"] = "Soft"
#converting F_type and G_Type in int type
df_2[["F_Type","G_Type"]] = df_2[["F_Type","G_Type"]].astype(int)
df_2
CodePudding user response:
Here is another way:
d = {(1,1):['Good','Good'],
(1,0):['4G','soft'],
(0,1):['1F','soft'],
(0,0):['1F','hard']}
df1 = df.set_index('ID').groupby(lambda x: x[1],axis=1).any().astype(int)
df2 = pd.DataFrame(pd.MultiIndex.from_frame(df1).map(d).tolist(),columns = ['Comm1','Comm2'])
final_df = pd.concat([df1.reset_index(),df2],axis=1)
Ouput:
ID F G Comm1 Comm2
0 1 1 1 Good Good
1 2 1 0 4G soft
2 3 1 1 Good Good
3 4 0 1 1F soft
4 5 0 0 1F hard
5 6 1 1 Good Good