Home > Net >  How get string between expression with Pandas?
How get string between expression with Pandas?

Time:11-10

I would like to get only codes between # and concat it to a new column.

What I have

id code
0 (#M05Q01900R00100# = 1) AND (#M05Q01950R00200# = 0)
1 (#M05Q01900R00100# = 1) AND ((#M05Q01950R00100# = 0) OR (#M05Q01950R00200# = 0))
2 (#M05Q01600R00100# = 1)
3 (#M05Q01125R00200# = 1)
4 (#M05Q01129R00100# = 1) AND (#M05Q01130R00300# = 0)
5 (#M05Q01130R00400# = 1)
6 (#M05Q01132R00100# = 1)
7 (#M05Q01132R00400# = 1)
8 (#M05Q01132R00400# = 1)
9 (#M05Q01140R00200# = 1)

What I would like to get

id code concat
0 (#M05Q01900R00100# = 1) AND (#M05Q01950R00200# = 0) M05Q01900R00100, M05Q01950R00200
1 (#M05Q01900R00100# = 1) AND ((#M05Q01950R00100# = 0) OR (#M05Q01950R00200# = 0)) M05Q01900R00100, M05Q01950R00100, M05Q01950R00200
2 (#M05Q01600R00100# = 1) M05Q01600R00100
3 (#M05Q01125R00200# = 1) M05Q01125R00200
4 (#M05Q01129R00100# = 1) AND (#M05Q01130R00300# = 0) M05Q01129R00100, M05Q01130R00300
5 (#M05Q01130R00400# = 1) M05Q01130R00400
6 (#M05Q01132R00100# = 1) M05Q01132R00100
7 (#M05Q01132R00400# = 1) M05Q01132R00400
8 (#M05Q01132R00400# = 1) M05Q01132R00400
9 (#M05Q01140R00200# = 1) M05Q01140R00200

CodePudding user response:

Use Series.str.findall with regex for values between # and then Series.str.join:

df['concat'] = df['code'].str.findall(r'#(.*?)#').str.join(', ')
print (df)
   id                                                code  \
0    0  (#M05Q01900R00100# = 1) AND (#M05Q01950R00200#...   
1    1  (#M05Q01900R00100# = 1) AND ((#M05Q01950R00100...   
2    2                            (#M05Q01600R00100# = 1)   
3    3                            (#M05Q01125R00200# = 1)   
4    4  (#M05Q01129R00100# = 1) AND (#M05Q01130R00300#...   
5    5                            (#M05Q01130R00400# = 1)   
6    6                            (#M05Q01132R00100# = 1)   
7    7                            (#M05Q01132R00400# = 1)   
8    8                            (#M05Q01132R00400# = 1)   
9    9                            (#M05Q01140R00200# = 1)   

                                              concat  
0                   M05Q01900R00100, M05Q01950R00200  
1  M05Q01900R00100, M05Q01950R00100, M05Q01950R00200  
2                                    M05Q01600R00100  
3                                    M05Q01125R00200  
4                   M05Q01129R00100, M05Q01130R00300  
5                                    M05Q01130R00400  
6                                    M05Q01132R00100  
7                                    M05Q01132R00400  
8                                    M05Q01132R00400  
9                                    M05Q01140R00200  
  • Related