Pandas renumber a column from another column's datetime-CodePudding

Below are a set of reference numbers that are ... rather flawed.

        reference      date   subreg
8606    1985-12     1984-08-05  62
8607    1985-11     1984-08-02  62
8608    1985-10     1984-07-30  62
8609    1985-9      1984-05-12  93
8610    1985-8      1983-04-10  93
8611    1985-7      1983-03-02  57
8612    1985-1      1981-04-10  22
8613    1985-5      1980-02-13  51

I'd like them to like this:

        reference      date   subreg
8606    1984-4      1984-08-05  62
8607    1984-3      1984-08-02  62
8608    1984-2      1984-07-30  62
8609    1984-1      1984-05-12  93
8610    1983-2      1983-04-10  93
8611    1983-1      1983-03-02  57
8612    1981-1      1981-04-10  22
8613    1980-1      1980-02-13  51

How do I do this efficiently with pandas? So far as I can tell there is no question that addresses this.

CodePudding user response：

Try this:

df['date'] = pd.to_datetime(df['date'])

df['reference'] = (df['date'].dt.year.astype(str)   
                   '-'   
                   (df.sort_values('date')
                      .groupby(df['date'].dt.year)
                      .cumcount() 1).astype(str))

Output:

     reference       date  subreg
8606    1984-4 1984-08-05      62
8607    1984-3 1984-08-02      62
8608    1984-2 1984-07-30      62
8609    1984-1 1984-05-12      93
8610    1983-2 1983-04-10      93
8611    1983-1 1983-03-02      57
8612    1981-1 1981-04-10      22
8613    1980-1 1980-02-13      51

CodePudding user response：

This will work

df['reference'] = df['date'].apply(lambda x : x.split('-')[0])
df['RN'] = df.sort_values('date').groupby('reference').cumcount()   1
df['reference'] = df['reference']   '-'   df['RN'].astype(str)
df