Home > OS >  Creating multiple boolean columns in pandas based on two conditions
Creating multiple boolean columns in pandas based on two conditions

Time:12-21

I asked this question and got a great help. I have a dataframe with multiple columns and 4 years of data and interested in ranks 1 or 2 only.

Name Rank  Year
 Joe  1     2019
 Ben  2     2018
 Jo   3     2020
 Bo   1     2018
 Boo  1     2021

If a name had 1 or 2 rank in a specific year, I want to create a relevant boolean column

Expected output

 Name Rank  Year If_1st_2018 If_1st_2019 If_1st_first_2020 If_1st_2021 If_2nd_2018 If_2nd_2019 etc
 Joe  1     2019     0           1           0                  0            0           0
 Ben  2     2018     0           0           0                  0            1           0
 Jo   3     2020     0           0           0                  0            0           0
 Bo   1     2018     1           0           0                  0            0           0
 Boo  1     2021      0           0           0                 1           0           0

CodePudding user response:

This time, I think a cool solution would be to combine the Rank and Year columns and then use pd.get_dummies:

df = pd.concat([df, pd.get_dummies('If_'   df['Rank'].map({1: '1st', 2: '2nd'})   '_'   df['Year'].astype(str))], axis=1)

Output:

>>> df
  Name  Rank  Year  If_1st_2018  If_1st_2019  If_1st_2021  If_2nd_2018
0  Joe     1  2019            0            1            0            0
1  Ben     2  2018            0            0            0            1
2   Jo     3  2020            0            0            0            0
3   Bo     1  2018            1            0            0            0
4  Boo     1  2021            0            0            1            0

CodePudding user response:

You can use:

df_new = pd.crosstab(df['Name'], [df['Rank'], df['Year']], dropna=False)
df_new = df_new[[1,2]]
df_new.columns = ['_'.join(map(str, x)) for x in df_new.columns]
df_new.reset_index(inplace=True)
df = df.merge(df_new, how='left', on=['Name'])
print(df)

OUTPUT

   Name  Rank  Year  1_2018  1_2019  1_2020  2_2018  2_2019  2_2020
0  Joe     1  2019       0       1       0       0       0       0
1  Ben     2  2018       0       0       0       1       0       0
2   Jo     3  2020       0       0       0       0       0       0
3   Bo     1  2018       1       0       0       0       0       0
  • Related