Home > Enterprise >  Add column to dataframe with the order parameter of a groupby
Add column to dataframe with the order parameter of a groupby

Time:11-10

I have a dataframe with the following structure:

import pandas as pd
import numpy as np
from datetime import datetime

df = pd.DataFrame(
    [
        [datetime(2020, 1, 1), 3343],
        [datetime(2020, 1, 1), 3443],
        [datetime(2020, 2, 1), 5431],
        [datetime(2020, 2, 1), 4443],
        [datetime(2020, 2, 1), 7999],
    ],
    columns=["A", "B"],
)

enter image description here

I want to add a new column to the dataframe, that shows the order position for each column A, considering the ordering parameter the column B.
Thus, the result of the previous dataframe would be:

enter image description here

The goal is to do it in an efficient way, not using for loops.

CodePudding user response:

Use groupby rank.

df['new_column'] = df.groupby('A')['B'].rank(method='dense').astype(int)

Or as @АлексейР suggested, first sort the DataFrame by 'B' and then use groupby cumcount.

df = df.sort_values(['A', 'B'])
df['new_column'] = df.groupby('A').cumcount()   1
  • Related