I am trying to tag rows to a "root"
here is an example
import pandas as pd
collection = ({
'REZID':["E0912","E0912","E0912","E0912","E0912","E0912","E0912","E0912","E0913","E0913",],
'POS' :["01","0101","0102","0103","010301","010302","02","0201","01","0101"],
'KOMPID':['k01','k02','k03','k04','k05','k06','k07','k08','k09','k10'],
'WEIGHT':[1000,300,400,300,150,150,1400,1400,1200,500]
})
df = pd.DataFrame(collection, columns=['REZID','POS','KOMPID','WEIGHT'])
REZID POS KOMPID WEIGHT
0 E0912 01 k01 1000
1 E0912 0101 k02 300
2 E0912 0102 k03 400
3 E0912 0103 k04 300
4 E0912 010301 k05 150
5 E0912 010302 k06 150
6 E0912 02 k07 1400
7 E0912 0201 k08 1400
8 E0913 01 k09 1200
9 E0913 0101 k10 500
I want to tag the rows that have more than 2 dig on POS with their root which is 2 dig
root = df.loc[df['POS'].str.len() == 2]
REZID POS KOMPID WEIGHT
0 E0912 01 k01 1000
6 E0912 02 k07 1400
8 E0913 01 k09 1200
what I want
REZID POS KOMPID WEIGHT ROOT
0 E0912 01 k01 1000 k01
1 E0912 0101 k02 300 k01
2 E0912 0102 k03 400 k01
3 E0912 0102 k04 300 k01
4 E0912 010201 k05 150 k01
5 E0912 010202 k06 150 k01
6 E0912 02 k07 1400 k07
7 E0912 0201 k08 1400 k07
8 E0913 01 k09 1200 k09
9 E0913 0101 k10 500 k09
CodePudding user response:
(df['REZID'] == row['REZID'])
is to check if has the same REZID
value as the current row.
(df['POS'].str.startswith(root_pos))
is to check if has a POS
value that starts with the root position.
Use .iloc[0]
to return the first row.
def get_root(row):
root_pos = row['POS'][:2]
root_kompid = df[(df['REZID'] == row['REZID']) & (df['POS'].str.startswith(root_pos))]['KOMPID'].iloc[0]
return root_kompid
df['ROOT'] = df.apply(get_root, axis=1)
CodePudding user response:
You may simply create a new column called ROOT
and populate it with the KOMPID
for those rows where POS
has exactly 2 digits:
df['ROOT']=root['KOMPID']
This returns:
REZID POS KOMPID WEIGHT ROOT
0 E0912 01 k01 1000 k01
1 E0912 0101 k02 300 NaN
2 E0912 0102 k03 400 NaN
3 E0912 0103 k04 300 NaN
4 E0912 010301 k05 150 NaN
5 E0912 010302 k06 150 NaN
6 E0912 02 k07 1400 k07
7 E0912 0201 k08 1400 NaN
8 E0913 01 k09 1200 k09
9 E0913 0101 k10 500 NaN
Then use the fillna
method to "forward fill" the empty cells with the last available ROOT
value:
df = df.fillna(method = 'ffill')
This returns:
REZID POS KOMPID WEIGHT ROOT
0 E0912 01 k01 1000 k01
1 E0912 0101 k02 300 k01
2 E0912 0102 k03 400 k01
3 E0912 0103 k04 300 k01
4 E0912 010301 k05 150 k01
5 E0912 010302 k06 150 k01
6 E0912 02 k07 1400 k07
7 E0912 0201 k08 1400 k07
8 E0913 01 k09 1200 k09
9 E0913 0101 k10 500 k09