If column 'id' contains "(Slot" then slot no. should be extracted in 'Output' column from the 'id' column else if column 'id' contains "PR GRP" then 1st slot no. should be extracted in 'Output' column from the 'grp' column. Positions of Slot No. are fixed i.e in 1st case need to extract 7 characters from the location where '(Slot" found and then remove '(' to get the slot no. In 2nd case 1st Slot No. would always be found at 6th place and will have 6 characters as well.
id grp Output
Alpha-old-1_MOD (Slot01) - Slot01
Alpha-old-1_PR GRP1 MOD (Slot07);MOD (Slot08) Slot07
Beta-new-1_PR GRP2 MOD (Slot13);MOD (Slot14) Slot13
Beta-old-1_MOD (Slot03) - Slot03
Expected output:
]
Input & Output Table for Query No. 2 & 3
CodePudding user response:
Try with str.extract
:
df["output"] = df["id"].str.extract("(Slot\d )").fillna(df["grp"].str.extract("(Slot\d )"))
>>> df
id grp output
0 Alpha-old-1_MOD (Slot01) - Slot01
1 Alpha-old-1_PR GRP1 MOD (Slot07);MOD (Slot08) Slot07
2 Beta-new-1_PR GRP2 MOD (Slot13);MOD (Slot14) Slot13
3 Beta-old-1_MOD (Slot03) - Slot03
CodePudding user response:
use a regular expression to look for the word Slot follow by 2 digits
import re
data="""id\tgrp
Alpha-old-1_MOD (Slot01)\t-
Alpha-old-1_PR GRP1\tMOD (Slot07);MOD (Slot08)
Beta-new-1_PR GRP2\tMOD (Slot13);MOD (Slot14)
Beta-old-1_MOD (Slot03)\t-
"""
df=pd.read_csv(io.StringIO(data),sep='\t')
pattern=r"[\s\w\(] (Slot{1}\d{2}) [\s\w\)] "
df['output']=''
for key,row in df.iterrows():
if re.search("PR GRP",str(row['id'])):
matches=re.findall(pattern,str(row['grp']))
else:
matches=re.findall(pattern,str(row['id']))
if len(matches)>0:
df.loc[key,'output']=matches[0]
print(df[['id','grp','output']])
output:
id grp output
0 Alpha-old-1_MOD (Slot01) - [Slot01]
1 Alpha-old-1_PR GRP1 MOD (Slot07);MOD (Slot08) [Slot07]
2 Beta-new-1_PR GRP2 MOD (Slot13);MOD (Slot14) [Slot13]
3 Beta-old-1_MOD (Slot03) - [Slot03]