I have some data as below,
AST_NAME |
---|
2F |
3F |
4F |
5F |
2-F-C |
3-F-A |
4-F-C |
4-F-D |
5-F-E |
5-F-F |
SwB |
6-F-G |
SwB |
7-F-A |
I want to extract the number and letter F only from those values like 2-F-C or 3-F-D. My desired output is as below.
AST_NAME |
---|
2F |
3F |
4F |
5F |
2F |
3F |
4F |
4F |
5F |
5F |
SwB |
6F |
SwB |
7F |
I tried using lstrip() and rstrip() but couldn't get a simple solution. I even tried extracting 1st 3 letters and then removing '-' but I failed to achieve the desired output. Kindly suggest. Thank you.
CodePudding user response:
You can use a regex and str.replace
:
df['AST_NAME_clean'] = df['AST_NAME'].str.replace(r'^(\d )-?(F).*',
r'\1\2', regex=True)
Output:
AST_NAME AST_NAME_clean
0 2F 2F
1 3F 3F
2 4F 4F
3 5F 5F
4 2-F-C 2F
5 3-F-A 3F
6 4-F-C 4F
7 4-F-D 4F
8 5-F-E 5F
9 5-F-F 5F
10 SwB SwB
11 6-F-G 6F
12 SwB SwB
13 7-F-A 7F
CodePudding user response:
try this:
tmp = (df.AST_NAME
.str.extract(r'(\d -?[A-Z])', expand=False)
.dropna()
.str.replace('-', ''))
res = tmp.combine_first(df.AST_NAME).to_frame()
print(res)
>>>
AST_NAME
0 2F
1 3F
2 4F
3 5F
4 2F
5 3F
6 4F
7 4F
8 5F
9 5F
10 SwB
11 6F
12 SwB
13 7F