I would like to get only codes between # and concat it to a new column.
What I have
id | code |
---|---|
0 | (#M05Q01900R00100# = 1) AND (#M05Q01950R00200# = 0) |
1 | (#M05Q01900R00100# = 1) AND ((#M05Q01950R00100# = 0) OR (#M05Q01950R00200# = 0)) |
2 | (#M05Q01600R00100# = 1) |
3 | (#M05Q01125R00200# = 1) |
4 | (#M05Q01129R00100# = 1) AND (#M05Q01130R00300# = 0) |
5 | (#M05Q01130R00400# = 1) |
6 | (#M05Q01132R00100# = 1) |
7 | (#M05Q01132R00400# = 1) |
8 | (#M05Q01132R00400# = 1) |
9 | (#M05Q01140R00200# = 1) |
What I would like to get
id | code | concat |
---|---|---|
0 | (#M05Q01900R00100# = 1) AND (#M05Q01950R00200# = 0) | M05Q01900R00100, M05Q01950R00200 |
1 | (#M05Q01900R00100# = 1) AND ((#M05Q01950R00100# = 0) OR (#M05Q01950R00200# = 0)) | M05Q01900R00100, M05Q01950R00100, M05Q01950R00200 |
2 | (#M05Q01600R00100# = 1) | M05Q01600R00100 |
3 | (#M05Q01125R00200# = 1) | M05Q01125R00200 |
4 | (#M05Q01129R00100# = 1) AND (#M05Q01130R00300# = 0) | M05Q01129R00100, M05Q01130R00300 |
5 | (#M05Q01130R00400# = 1) | M05Q01130R00400 |
6 | (#M05Q01132R00100# = 1) | M05Q01132R00100 |
7 | (#M05Q01132R00400# = 1) | M05Q01132R00400 |
8 | (#M05Q01132R00400# = 1) | M05Q01132R00400 |
9 | (#M05Q01140R00200# = 1) | M05Q01140R00200 |
CodePudding user response:
Use Series.str.findall
with regex for values between #
and then Series.str.join
:
df['concat'] = df['code'].str.findall(r'#(.*?)#').str.join(', ')
print (df)
id code \
0 0 (#M05Q01900R00100# = 1) AND (#M05Q01950R00200#...
1 1 (#M05Q01900R00100# = 1) AND ((#M05Q01950R00100...
2 2 (#M05Q01600R00100# = 1)
3 3 (#M05Q01125R00200# = 1)
4 4 (#M05Q01129R00100# = 1) AND (#M05Q01130R00300#...
5 5 (#M05Q01130R00400# = 1)
6 6 (#M05Q01132R00100# = 1)
7 7 (#M05Q01132R00400# = 1)
8 8 (#M05Q01132R00400# = 1)
9 9 (#M05Q01140R00200# = 1)
concat
0 M05Q01900R00100, M05Q01950R00200
1 M05Q01900R00100, M05Q01950R00100, M05Q01950R00200
2 M05Q01600R00100
3 M05Q01125R00200
4 M05Q01129R00100, M05Q01130R00300
5 M05Q01130R00400
6 M05Q01132R00100
7 M05Q01132R00400
8 M05Q01132R00400
9 M05Q01140R00200