Pandas creating incremental values in new column based on certain conditions-CodePudding

I have dataframe in this form:

Name Rank  Months
A     'A3'  2
A     'A3'  2
A     'A2'  3
A     'A2'  3
A     'A2'  3
B     'A1'  4
B     'A1'  4
B     'A1'  4
B     'A1'  4
C     'A3'  2
C     'A3'  2
C     'A2'  1

What is the most effective way to create new column with incremental values based on number of months for certain Name and on condition of Rank? So basically the output is the following:

Name Rank  Months  NewIncremental
A     'A3'  2       'P4'
A     'A3'  2       'P5'
A     'A2'  3       'P1'
A     'A2'  3       'P2'
A     'A2'  3       'P3'
B     'A1'  4       'P1'
B     'A1'  4       'P2'
B     'A1'  4       'P3'
B     'A1'  4       'P4'
C     'A3'  2       'P2'
C     'A3'  2       'P3'
C     'A2'  1       'P1'

So the condition would be the rank order, which is A1->A2->A3. Meaning that if there is a name with A2 rank I assign lower incremental value. I guess sorting based on this can help?

EDIT: edited order so that I need to provide arbitrary order of the ranks

CodePudding user response：

Does this solve it for you?

df['NewIncrement'] = 'P'   df.sort_values(['Name', 'Rank']).groupby('Name').rank(method="first", ascending=True).astype(int).astype(str)

CodePudding user response：

IIUC you can simply use rank:

df["new"] = "P" df.groupby("Name")["Rank"].rank(method="first").astype(int).astype(str)
print (df)

   Name  Rank  Months new
0     A  'A1'       2  P1
1     A  'A1'       2  P2
2     A  'A2'       3  P3
3     A  'A2'       3  P4
4     A  'A2'       3  P5
5     B  'A1'       4  P1
6     B  'A1'       4  P2
7     B  'A1'       4  P3
8     B  'A1'       4  P4
9     C  'A3'       2  P2
10    C  'A3'       2  P3
11    C  'A2'       1  P1

CodePudding user response：

One approach:

ranks = df.sort_values(by=["Rank"],
                    key=lambda x: x.str.replace(r"\D ", "", regex=True).astype(int))\
        .groupby("Name").transform("cumcount")   1
ranks = ranks.apply("P{}".format)

df["NewIncremental"] = ranks
print(df)

Output

   Name Rank  Months NewIncremental
0     A   A1       2             P1
1     A   A1       2             P2
2     A   A2       3             P3
3     A   A2       3             P4
4     A   A2       3             P5
5     B   A1       4             P1
6     B   A1       4             P2
7     B   A1       4             P3
8     B   A1       4             P4
9     C   A3       2             P2
10    C   A3       2             P3
11    C   A2       1             P1

Step-by-step

# sort df by the given criteria, then group-by
sorted_by_rank = df.sort_values(by=["Rank"], key=lambda x: x.str.replace(r"\D ", "", regex=True).astype(int))

# get the ranks and apply the expected format
ranks = sorted_by_rank.groupby("Name").transform("cumcount")   1
ranks = ranks.apply("P{}".format)

# assign the new column
df["NewIncremental"] = ranks
print(df)