Home > Blockchain >  Update values in dataframe column based on array with row numbers indicating the start of the range
Update values in dataframe column based on array with row numbers indicating the start of the range

Time:10-17

I have the following dataframe and two arrays fillrows and salescodes. Fillrows is an array that I generated by extracting the rows where a condition is met. In order to be able to identify the person who generate the sales, I need to fill an additional column with the SalesCode. In fillrows I have the starting row of every SalesCode that corresponds to the codes in array salescodes.

df = pd.DataFrame([['', 120], ['', 300], ['', 160],['', 220], ['', 400], ['', 560],['', 860],['',900]], columns=['SalesCode', 'Sales'])
fillrows=[0,3,5]
salescodes =['XA','AC','BC']

  SalesCode  Sales
0              120
1              300
2              160
3              220
4              400
5              560
6              860
7              900

In need to have this output

Output
   SalesCode  Sales
0        XA    120
1        XA    300
2        XA    160
3        AC    220
4        AC    400
5        BC    560
6        BC    860
7        BC    900

CodePudding user response:

You could join a crafted Series and ffill:

s = pd.Series(salescodes, index=fillrows[:-1], name='SalesCode')
df2 = df.drop('SalesCode', axis=1).join(s).ffill()

NB. the drop part is not needed if the empty "SalesCode" columns does not exist in df

output:

>>> df2
   Sales SalesCode
0    120        XA
1    300        XA
2    160        XA
3    220        AC
4    400        AC
5    560        BC
6    860        BC
7    900        BC

CodePudding user response:

You can simply iterate over the array of fillrows and add values for salescode. note: I changed the last index to be Len(df) since there's no 4th value in salescode that can start from idx 7; feel free to play around.

import pandas as pd
from itertools import repeat

df = pd.DataFrame([['', 120], ['', 300], ['', 160],['', 220], ['', 400], ['', 560],['', 860],['',900]], columns=['SalesCode', 'Sales'])
fillrows=[0,3,5, len(df)]
salescodes =['XA','AC','BC']

SalesCodes = []

for idx, _ in enumerate(fillrows[:-1]):
  SalesCodes.extend(repeat(salescodes[idx], fillrows[idx 1] - fillrows[idx]))

df = df.assign(SalesCode=SalesCodes)
> df
  SalesCode  Sales
0        XA    120
1        XA    300
2        XA    160
3        AC    220
4        AC    400
5        BC    560
6        BC    860
7        BC    900
  • Related