Home > OS >  pandas: how to select first or last by column in keep with drop_duplicates
pandas: how to select first or last by column in keep with drop_duplicates

Time:06-05

As shown below, name must be keep in fisrt and team in last.

How can I accomplish this with .drop_duplicates() or otherwise?

   name  team ...
0  john  a    ...
1  mike  b    ...
2  john  c

↓

   name  team ...
0  john  c    ...
1  mike  b    ...

-- Additional note about comments --

.groupby('name').agg({'team': 'last', 'country': 'first'})

The way it works now, if the first line of country is Nan If the first line of country is Nan, a value that is not the first will be obtained as follows.

Is this because the case of Nan is ignored? Even if first is specified and first is Nan, Nan must still be retained.

   name  team  country ...
0  john   a    Nan     ...
1  mike  b     Brazil  ...
2  john  c     Canada  ...

↓

   name  team  country ...
0  john  c     Canada  ...
1  mike  b     Brazil  ...

CodePudding user response:

You can use the .groupby() function:

df.groupby('name').agg({'team': 'last'}).

Be aware that in the value that's returned per name is dependent on the sorting of your dataframe.

  • Related