Home > Net >  select rows on 2 conditions and print values in new column for all rows that start with
select rows on 2 conditions and print values in new column for all rows that start with

Time:12-18

I am trying to tag rows to a "root"

here is an example

import pandas as pd

collection   = ({
    'REZID':["E0912","E0912","E0912","E0912","E0912","E0912","E0912","E0912","E0913","E0913",],
    'POS' :["01","0101","0102","0103","010301","010302","02","0201","01","0101"],
    'KOMPID':['k01','k02','k03','k04','k05','k06','k07','k08','k09','k10'],
    'WEIGHT':[1000,300,400,300,150,150,1400,1400,1200,500]
                })

df = pd.DataFrame(collection, columns=['REZID','POS','KOMPID','WEIGHT'])

   REZID     POS KOMPID  WEIGHT
0  E0912      01    k01    1000
1  E0912    0101    k02     300
2  E0912    0102    k03     400
3  E0912    0103    k04     300
4  E0912  010301    k05     150
5  E0912  010302    k06     150
6  E0912      02    k07    1400
7  E0912    0201    k08    1400
8  E0913      01    k09    1200
9  E0913    0101    k10     500

I want to tag the rows that have more than 2 dig on POS with their root which is 2 dig

root = df.loc[df['POS'].str.len() == 2]

 REZID  POS     KOMPID  WEIGHT
0   E0912   01  k01     1000
6   E0912   02  k07     1400
8   E0913   01  k09     1200

what I want

   REZID     POS KOMPID  WEIGHT ROOT
0  E0912      01    k01    1000  k01
1  E0912    0101    k02     300  k01
2  E0912    0102    k03     400  k01
3  E0912    0102    k04     300  k01
4  E0912  010201    k05     150  k01
5  E0912  010202    k06     150  k01
6  E0912      02    k07    1400  k07
7  E0912    0201    k08    1400  k07
8  E0913      01    k09    1200  k09
9  E0913    0101    k10     500  k09

CodePudding user response:

(df['REZID'] == row['REZID']) is to check if has the same REZID value as the current row.

(df['POS'].str.startswith(root_pos)) is to check if has a POS value that starts with the root position.

Use .iloc[0] to return the first row.

def get_root(row):
    root_pos = row['POS'][:2]
    root_kompid = df[(df['REZID'] == row['REZID']) & (df['POS'].str.startswith(root_pos))]['KOMPID'].iloc[0]
    return root_kompid

df['ROOT'] = df.apply(get_root, axis=1)

CodePudding user response:

You may simply create a new column called ROOT and populate it with the KOMPID for those rows where POS has exactly 2 digits:

df['ROOT']=root['KOMPID']

This returns:

    REZID   POS     KOMPID  WEIGHT  ROOT
0   E0912   01      k01     1000    k01
1   E0912   0101    k02     300     NaN
2   E0912   0102    k03     400     NaN
3   E0912   0103    k04     300     NaN
4   E0912   010301  k05     150     NaN
5   E0912   010302  k06     150     NaN
6   E0912   02      k07     1400    k07
7   E0912   0201    k08     1400    NaN
8   E0913   01      k09     1200    k09
9   E0913   0101    k10     500     NaN

Then use the fillna method to "forward fill" the empty cells with the last available ROOT value:

df = df.fillna(method = 'ffill')

This returns:

        REZID   POS     KOMPID  WEIGHT  ROOT
0       E0912   01      k01     1000    k01
1       E0912   0101    k02     300     k01
2       E0912   0102    k03     400     k01
3       E0912   0103    k04     300     k01
4       E0912   010301  k05     150     k01
5       E0912   010302  k06     150     k01
6       E0912   02      k07     1400    k07
7       E0912   0201    k08     1400    k07
8       E0913   01      k09     1200    k09
9       E0913   0101    k10     500     k09
  • Related