I have a database for eg.
Name Subset
Apple -AI-,-BI-A,-XC-,ZX-
Bat -po-,-IJ-,-IA-B
and want to convert it to:
Name Subset
Apple AI,BI,XC,ZX
Bat po,IJ,IA
i.e removing the first hyphen and removing the second hyphen and charachters following it until next comma appears.
CodePudding user response:
IIUC, use extractall
with the -([^-] )-
regex to get only the values between dashes. Then GroupBy.agg
with ','.join
to concatenate the values:
df['Subset'] = (df['Subset'].str.extractall(r'-([^-] )-')[0]
.groupby(level=0).agg(','.join)
)
output:
Name Subset
0 Apple AI,BI,XC
1 Bat po,IJ,IA
To handle the case where the left dash can be missing, use [-,] ([^-,] )-
:
df['Subset'] = (df['Subset'].str.extractall(r'[-,] ([^-,] )-')[0]
.groupby(level=0).agg(','.join))
output:
Name Subset
0 Apple AI,BI,XC,ZX
1 Bat po,IJ,I
CodePudding user response:
Use Series.str.split
, Series.explode
and Series.groupby.agg
:
In [2193]: df['Subset'] = df.Subset.str.split(',').explode().str.split('-').str[:2].apply(''.join).groupby(level=0).agg(','.join)
In [2194]: df
Out[2194]:
Name Subset
0 Apple AI,BI,XC,ZX
1 Bat po,IJ,IA