I asked this question and got a great help. I have a dataframe with multiple columns and 4 years of data and interested in ranks 1 or 2 only.
Name Rank Year
Joe 1 2019
Ben 2 2018
Jo 3 2020
Bo 1 2018
Boo 1 2021
If a name had 1 or 2 rank in a specific year, I want to create a relevant boolean column
Expected output
Name Rank Year If_1st_2018 If_1st_2019 If_1st_first_2020 If_1st_2021 If_2nd_2018 If_2nd_2019 etc
Joe 1 2019 0 1 0 0 0 0
Ben 2 2018 0 0 0 0 1 0
Jo 3 2020 0 0 0 0 0 0
Bo 1 2018 1 0 0 0 0 0
Boo 1 2021 0 0 0 1 0 0
CodePudding user response:
This time, I think a cool solution would be to combine the Rank
and Year
columns and then use pd.get_dummies
:
df = pd.concat([df, pd.get_dummies('If_' df['Rank'].map({1: '1st', 2: '2nd'}) '_' df['Year'].astype(str))], axis=1)
Output:
>>> df
Name Rank Year If_1st_2018 If_1st_2019 If_1st_2021 If_2nd_2018
0 Joe 1 2019 0 1 0 0
1 Ben 2 2018 0 0 0 1
2 Jo 3 2020 0 0 0 0
3 Bo 1 2018 1 0 0 0
4 Boo 1 2021 0 0 1 0
CodePudding user response:
You can use:
df_new = pd.crosstab(df['Name'], [df['Rank'], df['Year']], dropna=False)
df_new = df_new[[1,2]]
df_new.columns = ['_'.join(map(str, x)) for x in df_new.columns]
df_new.reset_index(inplace=True)
df = df.merge(df_new, how='left', on=['Name'])
print(df)
OUTPUT
Name Rank Year 1_2018 1_2019 1_2020 2_2018 2_2019 2_2020
0 Joe 1 2019 0 1 0 0 0 0
1 Ben 2 2018 0 0 0 1 0 0
2 Jo 3 2020 0 0 0 0 0 0
3 Bo 1 2018 1 0 0 0 0 0