Extracting only a number and letter F from the column of a pandas dataframe-CodePudding

I have some data as below,

AST_NAME
2F
3F
4F
5F
2-F-C
3-F-A
4-F-C
4-F-D
5-F-E
5-F-F
SwB
6-F-G
SwB
7-F-A

I want to extract the number and letter F only from those values like 2-F-C or 3-F-D. My desired output is as below.

AST_NAME
2F
3F
4F
5F
2F
3F
4F
4F
5F
5F
SwB
6F
SwB
7F

I tried using lstrip() and rstrip() but couldn't get a simple solution. I even tried extracting 1st 3 letters and then removing '-' but I failed to achieve the desired output. Kindly suggest. Thank you.

CodePudding user response：

You can use a regex and str.replace:

df['AST_NAME_clean'] = df['AST_NAME'].str.replace(r'^(\d )-?(F).*',
                                                  r'\1\2', regex=True)

Output:

   AST_NAME AST_NAME_clean
0        2F             2F
1        3F             3F
2        4F             4F
3        5F             5F
4     2-F-C             2F
5     3-F-A             3F
6     4-F-C             4F
7     4-F-D             4F
8     5-F-E             5F
9     5-F-F             5F
10      SwB            SwB
11    6-F-G             6F
12      SwB            SwB
13    7-F-A             7F

regex demo

CodePudding user response：

try this:

tmp = (df.AST_NAME
       .str.extract(r'(\d -?[A-Z])', expand=False)
       .dropna()
       .str.replace('-', ''))
res = tmp.combine_first(df.AST_NAME).to_frame()
print(res)
>>>
    AST_NAME
0   2F
1   3F
2   4F
3   5F
4   2F
5   3F
6   4F
7   4F
8   5F
9   5F
10  SwB
11  6F
12  SwB
13  7F