I have a dataframe that has column , 'col', with both positive and negative numbers. I would like run a ranking separately on both the positive and negative numbers only with 0 excluded not to mess up the ranking. My issue is that my code below is updating the 'col' column. I must be keeping a reference it but not sure where?
data = {'col':[random.randint(-1000, 1000) for _ in range(100)]}
df = pd.DataFrame(data)
pos_idx = np.where(df.col > 0)[0]
neg_idx = np.where(df.col < 0)[0]
p = df[df.col > 0].col.values
n = df[df.col < 0].col.values
p_rank = np.round(p.argsort().argsort()/(len(p)-1)*100,1)
n_rank = np.round((n*-1).argsort().argsort()/(len(n)-1)*100,1)
pc = df.col.values
pc[pc > 0] = p_rank
pc[pc < 0] = n_rank
df['ranking'] = pc
CodePudding user response:
was able to figure it out on my own.
created a new column of zeros then used .loc to update te value at their respective index locations.
df['ranking'] = 0
df[df.col > 0, 'ranking'] = pos_rank
df[df.col < 0, 'ranking'] = neg_rank
CodePudding user response:
One way to do it is to avoid mutating the original dataframe by replacing this line in your code:
pc = df.col.values
with:
pc = df.copy().col.values
So that:
print(df)
# Output
col ranking
0 -492 49
1 884 93
2 -355 36
3 741 77
4 -210 24
.. ... ...
95 564 57
96 683 63
97 -129 18
98 -413 44
99 810 81
[100 rows x 2 columns]