I have a dataframe with two columns: name and version.
I want to add a boolean in an extra column. So if the highest version then true, otherwise false.
import pandas as pd
data = [['a', 1], ['b', 2], ['a', 2], ['a', 2], ['b', 4]]
df = pd.DataFrame(data, columns = ['name', 'version'])
df
is it best to use groupby for this? I have tried smth. like this but I do not know how to add extra column with bolean.
df.groupby(['name']).max()
CodePudding user response:
Compare maximal values per groups created by GroupBy.transform
with max
for generate new column/ Series filled by maxinmal values, so possible compare by original column:
df['bool_col'] = df['version'] == df.groupby('name')['version'].transform('max')
print(df)
name version bool_col
0 a 1 False
1 b 2 False
2 a 2 True
3 a 2 True
4 b 4 True
Detail:
print(df.groupby('name')['version'].transform('max'))
0 2
1 4
2 2
3 2
4 4
Name: version, dtype: int64
CodePudding user response:
You can assign your column directly:
df['bool_col'] = df['version'] == max(df['version'])
Output:
name version bool_col
0 a 1 False
1 b 2 False
2 a 2 False
3 a 2 False
4 b 4 True
Is this what you were looking for?