I have the following dataframe and two arrays fillrows and salescodes. Fillrows is an array that I generated by extracting the rows where a condition is met. In order to be able to identify the person who generate the sales, I need to fill an additional column with the SalesCode. In fillrows I have the starting row of every SalesCode that corresponds to the codes in array salescodes.
df = pd.DataFrame([['', 120], ['', 300], ['', 160],['', 220], ['', 400], ['', 560],['', 860],['',900]], columns=['SalesCode', 'Sales'])
fillrows=[0,3,5]
salescodes =['XA','AC','BC']
SalesCode Sales
0 120
1 300
2 160
3 220
4 400
5 560
6 860
7 900
In need to have this output
Output
SalesCode Sales
0 XA 120
1 XA 300
2 XA 160
3 AC 220
4 AC 400
5 BC 560
6 BC 860
7 BC 900
CodePudding user response:
You could join
a crafted Series and ffill
:
s = pd.Series(salescodes, index=fillrows[:-1], name='SalesCode')
df2 = df.drop('SalesCode', axis=1).join(s).ffill()
NB. the drop
part is not needed if the empty "SalesCode" columns does not exist in df
output:
>>> df2
Sales SalesCode
0 120 XA
1 300 XA
2 160 XA
3 220 AC
4 400 AC
5 560 BC
6 860 BC
7 900 BC
CodePudding user response:
You can simply iterate over the array of fillrows
and add values for salescode. note: I changed the last index to be Len(df) since there's no 4th value in salescode that can start from idx 7; feel free to play around.
import pandas as pd
from itertools import repeat
df = pd.DataFrame([['', 120], ['', 300], ['', 160],['', 220], ['', 400], ['', 560],['', 860],['',900]], columns=['SalesCode', 'Sales'])
fillrows=[0,3,5, len(df)]
salescodes =['XA','AC','BC']
SalesCodes = []
for idx, _ in enumerate(fillrows[:-1]):
SalesCodes.extend(repeat(salescodes[idx], fillrows[idx 1] - fillrows[idx]))
df = df.assign(SalesCode=SalesCodes)
> df
SalesCode Sales
0 XA 120
1 XA 300
2 XA 160
3 AC 220
4 AC 400
5 BC 560
6 BC 860
7 BC 900