Using regex expresion to create a new Dataframe Column-CodePudding

I have the following Python DataFrame:

| ColumnA | File            |
| -------- | -------------- |
| First    | aasdkh.xls     |
| Second   | sadkhZ.xls     |
| Third    | asdasdPH.xls   |
| Fourth   | adsjklahsd.xls |

and so on.

I'm trying to get the following DataFrame:

| ColumnA | File              | Category|
| -------- | ---------------- | ------- |
| First    | aasdkh.xls       | N       |
| Second   | sadkhZ.xls       | Z       |
| Third    | asdasdPH.xls     | PH      |
| Fourth   | adsjklahsdPH.xls | PH      |

I'm trying to use regex expresions, but I'm not sure how to use them. I need to get a new column that "extracts" the category of the file; N if is a "normal" file (no category), Z if the file contains a "Z" just before the extension and PH if the file contains a "PH" before the extension.

I defined the following regex expresions that I think I could use, but I dont know how to use them:

    regex_Z = re.compile('Z.xls$')    
    regex_PH = re.compile('PH.xls$')

PD: Could you recomend me any website to learn how to use the regex expresions?

CodePudding user response：

Let's try

df['Category']  = df['File'].str.extract('(Z|PH)\.xls$').fillna('N')

print(df)

  ColumnA            File Category
0   First      aasdkh.xls        N
1  Second      sadkhZ.xls        Z
2   Third    asdasdPH.xls       PH
3  Fourth  adsjklahsd.xls        N