I have dataframe in this form:
Name Rank Months
A 'A3' 2
A 'A3' 2
A 'A2' 3
A 'A2' 3
A 'A2' 3
B 'A1' 4
B 'A1' 4
B 'A1' 4
B 'A1' 4
C 'A3' 2
C 'A3' 2
C 'A2' 1
What is the most effective way to create new column with incremental values based on number of months for certain Name and on condition of Rank? So basically the output is the following:
Name Rank Months NewIncremental
A 'A3' 2 'P4'
A 'A3' 2 'P5'
A 'A2' 3 'P1'
A 'A2' 3 'P2'
A 'A2' 3 'P3'
B 'A1' 4 'P1'
B 'A1' 4 'P2'
B 'A1' 4 'P3'
B 'A1' 4 'P4'
C 'A3' 2 'P2'
C 'A3' 2 'P3'
C 'A2' 1 'P1'
So the condition would be the rank order, which is A1->A2->A3. Meaning that if there is a name with A2 rank I assign lower incremental value. I guess sorting based on this can help?
EDIT: edited order so that I need to provide arbitrary order of the ranks
CodePudding user response:
Does this solve it for you?
df['NewIncrement'] = 'P' df.sort_values(['Name', 'Rank']).groupby('Name').rank(method="first", ascending=True).astype(int).astype(str)
CodePudding user response:
IIUC you can simply use rank
:
df["new"] = "P" df.groupby("Name")["Rank"].rank(method="first").astype(int).astype(str)
print (df)
Name Rank Months new
0 A 'A1' 2 P1
1 A 'A1' 2 P2
2 A 'A2' 3 P3
3 A 'A2' 3 P4
4 A 'A2' 3 P5
5 B 'A1' 4 P1
6 B 'A1' 4 P2
7 B 'A1' 4 P3
8 B 'A1' 4 P4
9 C 'A3' 2 P2
10 C 'A3' 2 P3
11 C 'A2' 1 P1
CodePudding user response:
One approach:
ranks = df.sort_values(by=["Rank"],
key=lambda x: x.str.replace(r"\D ", "", regex=True).astype(int))\
.groupby("Name").transform("cumcount") 1
ranks = ranks.apply("P{}".format)
df["NewIncremental"] = ranks
print(df)
Output
Name Rank Months NewIncremental
0 A A1 2 P1
1 A A1 2 P2
2 A A2 3 P3
3 A A2 3 P4
4 A A2 3 P5
5 B A1 4 P1
6 B A1 4 P2
7 B A1 4 P3
8 B A1 4 P4
9 C A3 2 P2
10 C A3 2 P3
11 C A2 1 P1
Step-by-step
# sort df by the given criteria, then group-by
sorted_by_rank = df.sort_values(by=["Rank"], key=lambda x: x.str.replace(r"\D ", "", regex=True).astype(int))
# get the ranks and apply the expected format
ranks = sorted_by_rank.groupby("Name").transform("cumcount") 1
ranks = ranks.apply("P{}".format)
# assign the new column
df["NewIncremental"] = ranks
print(df)