I have DataFrame in Python Pandas like below ("col1" is as string data type):
col1
-------
ACC_P:Indiv|ACC_K:3886|GROUP:gr1|COK:P1K
ACC_P:Group|ACC_K:42|GROUP:gr2|COK:P1C
ACC_P:Indiv|ACC_K:455|GROUP:gpk22|COK:AD2
...
And I need to create new column "col2" where will be only value between "GROUP:" and "|" from values in "col1", so as a result I need something like below:
col1 | col2
--------------------------------------------|-------
ACC_P:Indiv|ACC_K:3886|GROUP:gr1|COK:P1K | gr1
ACC_P:Group|ACC_K:42|GROUP:gr2|COK:P1C | gr2
ACC_P:Indiv|ACC_K:455|GROUP:gpk22|COK:AD2 | gpk22
... | ...
How can I do that in Python Pandas ?
CodePudding user response:
You could use regular expressions to extract it. Something like capture everything after GROUP: up until the |
df['col2'] = df.col1.str.extract('GROUP:(.*)\|')
Another less elegant, but perhaps more intuitive option if you aren't familiar w/regex.
Split on the pipe, take the second from last item, split on : and take the last element from that. This of course works only if the structure of your col1 values always follows the same format.
df.col1.str.split('|').str[-2].str.split(':').str[-1]
Output
col1 col2
0 ACC_P:Indiv|ACC_K:3886|GROUP:gr1|COK:P1K gr1
1 ACC_P:Group|ACC_K:42|GROUP:gr2|COK:P1C gr2
2 ACC_P:Indiv|ACC_K:455|GROUP:gpk22|COK:AD2 gpk22
CodePudding user response:
The more elegant way to do it would be through regex but here is my two cents:
d.col1.str.split('GROUP:').apply(lambda x: x[1].split('|')[0])