Home > Software engineering >  Mark highest documents with true
Mark highest documents with true

Time:04-25

I have a dataframe with two columns: name and version.

I want to add a boolean in an extra column. So if the highest version then true, otherwise false.

import pandas as pd
 
data = [['a', 1], ['b', 2], ['a', 2], ['a', 2], ['b', 4]]
 
df = pd.DataFrame(data, columns = ['name', 'version'])
 
df

is it best to use groupby for this? I have tried smth. like this but I do not know how to add extra column with bolean.

df.groupby(['name']).max()

CodePudding user response:

Compare maximal values per groups created by GroupBy.transform with max for generate new column/ Series filled by maxinmal values, so possible compare by original column:

df['bool_col'] = df['version'] == df.groupby('name')['version'].transform('max')
print(df)
  name  version  bool_col
0    a        1     False
1    b        2     False
2    a        2      True
3    a        2      True
4    b        4      True

Detail:

print(df.groupby('name')['version'].transform('max'))
0    2
1    4
2    2
3    2
4    4
Name: version, dtype: int64

CodePudding user response:

You can assign your column directly:

df['bool_col'] = df['version'] == max(df['version'])

Output:

  name  version  bool_col
0    a        1     False
1    b        2     False
2    a        2     False
3    a        2     False
4    b        4      True

Is this what you were looking for?

  • Related