Below are a set of reference numbers that are ... rather flawed.
reference date subreg
8606 1985-12 1984-08-05 62
8607 1985-11 1984-08-02 62
8608 1985-10 1984-07-30 62
8609 1985-9 1984-05-12 93
8610 1985-8 1983-04-10 93
8611 1985-7 1983-03-02 57
8612 1985-1 1981-04-10 22
8613 1985-5 1980-02-13 51
I'd like them to like this:
reference date subreg
8606 1984-4 1984-08-05 62
8607 1984-3 1984-08-02 62
8608 1984-2 1984-07-30 62
8609 1984-1 1984-05-12 93
8610 1983-2 1983-04-10 93
8611 1983-1 1983-03-02 57
8612 1981-1 1981-04-10 22
8613 1980-1 1980-02-13 51
How do I do this efficiently with pandas? So far as I can tell there is no question that addresses this.
CodePudding user response:
Try this:
df['date'] = pd.to_datetime(df['date'])
df['reference'] = (df['date'].dt.year.astype(str)
'-'
(df.sort_values('date')
.groupby(df['date'].dt.year)
.cumcount() 1).astype(str))
Output:
reference date subreg
8606 1984-4 1984-08-05 62
8607 1984-3 1984-08-02 62
8608 1984-2 1984-07-30 62
8609 1984-1 1984-05-12 93
8610 1983-2 1983-04-10 93
8611 1983-1 1983-03-02 57
8612 1981-1 1981-04-10 22
8613 1980-1 1980-02-13 51
CodePudding user response:
This will work
df['reference'] = df['date'].apply(lambda x : x.split('-')[0])
df['RN'] = df.sort_values('date').groupby('reference').cumcount() 1
df['reference'] = df['reference'] '-' df['RN'].astype(str)
df