Home > Net >  Extracting 'n' characters from a column of a dataframe, based on 'find' results
Extracting 'n' characters from a column of a dataframe, based on 'find' results

Time:11-23

If column 'id' contains "(Slot" then slot no. should be extracted in 'Output' column from the 'id' column else if column 'id' contains "PR GRP" then 1st slot no. should be extracted in 'Output' column from the 'grp' column. Positions of Slot No. are fixed i.e in 1st case need to extract 7 characters from the location where '(Slot" found and then remove '(' to get the slot no. In 2nd case 1st Slot No. would always be found at 6th place and will have 6 characters as well.

id                            grp                              Output
Alpha-old-1_MOD (Slot01)      -                                Slot01
Alpha-old-1_PR GRP1           MOD (Slot07);MOD (Slot08)        Slot07
Beta-new-1_PR GRP2            MOD (Slot13);MOD (Slot14)        Slot13
Beta-old-1_MOD (Slot03)       -                                Slot03

Expected output:

Input & Output Table]

Input & Output Table for Query No. 2 & 3

CodePudding user response:

Try with str.extract:

df["output"] = df["id"].str.extract("(Slot\d )").fillna(df["grp"].str.extract("(Slot\d )"))

>>> df
                         id                        grp  output
0  Alpha-old-1_MOD (Slot01)                          -  Slot01
1       Alpha-old-1_PR GRP1  MOD (Slot07);MOD (Slot08)  Slot07
2        Beta-new-1_PR GRP2  MOD (Slot13);MOD (Slot14)  Slot13
3   Beta-old-1_MOD (Slot03)                          -  Slot03

CodePudding user response:

use a regular expression to look for the word Slot follow by 2 digits

import re

data="""id\tgrp
Alpha-old-1_MOD (Slot01)\t-
Alpha-old-1_PR GRP1\tMOD (Slot07);MOD (Slot08)
Beta-new-1_PR GRP2\tMOD (Slot13);MOD (Slot14)
Beta-old-1_MOD (Slot03)\t-
"""



df=pd.read_csv(io.StringIO(data),sep='\t')

pattern=r"[\s\w\(] (Slot{1}\d{2}) [\s\w\)] "

df['output']=''
for key,row in df.iterrows():
    if re.search("PR GRP",str(row['id'])):
        matches=re.findall(pattern,str(row['grp']))
    else:
        matches=re.findall(pattern,str(row['id']))
    if len(matches)>0:
        df.loc[key,'output']=matches[0]

print(df[['id','grp','output']])    

output:

                         id                        grp            output
0  Alpha-old-1_MOD (Slot01)                          -          [Slot01]
1       Alpha-old-1_PR GRP1  MOD (Slot07);MOD (Slot08)          [Slot07]
2        Beta-new-1_PR GRP2  MOD (Slot13);MOD (Slot14)          [Slot13]
3   Beta-old-1_MOD (Slot03)                          -          [Slot03]
  • Related