I am trying to extract 4 characters after First,second,third and so on occurance of '/' from Column of Dataframe
Can someone suggest possible code let me know what is error in my code
CodePudding user response:
Try with str.findall
:
>>> df["NE Name"].str.findall(r"/([^/]{4})")
0 [01HJ]
1 [01HL, 02HL, 03HL, 10HL]
2 [01HL, 02HL, 03HL, 10HL]
3 [01HL, 02HL, 03HL, 10HL]
4 [01HL, 02HL, 03HL, 10HL]
Name: NE Name, dtype: object
Input DataFrame:
>>> df
NE Name Subrack ID pattern
0 10100000/01HJ 0 01HJ
1 10100000/01HL&10100000/02HL&10100000/03HL&10100000/10HL 1 01HJ
2 10100000/01HL&10100000/02HL&10100000/03HL&10100000/10HL 0 01HJ
3 10100000/01HL&10100000/02HL&10100000/03HL&10100000/10HL 2 01HJ
4 10100000/01HL&10100000/02HL&10100000/03HL&10100000/10HL 3 01HJ
CodePudding user response:
Here is a basic approach without using regular expressions:
- Use the
str.split('/')
method on the column to return a series of lists, each list containing the substrings in between the slashes. apply
a function to that series which returns the first four characters of each list element, except the first one. You can use thelambda
keyword to concisely define such a function within theapply
call.
import pandas as pd
df = pd.DataFrame({'col': ['101000000/01HJ',
'1010000/01HL&101/02HL&1010/03HL&04/10HL']})
df['col'].str.split('/').apply(lambda seq: [x[:4] for x in seq[1:]])
0 [01HJ]
1 [01HL, 02HL, 03HL, 10HL]
Name: col, dtype: object