I have the following Python DataFrame:
| ColumnA | File |
| -------- | -------------- |
| First | aasdkh.xls |
| Second | sadkhZ.xls |
| Third | asdasdPH.xls |
| Fourth | adsjklahsd.xls |
and so on.
I'm trying to get the following DataFrame:
| ColumnA | File | Category|
| -------- | ---------------- | ------- |
| First | aasdkh.xls | N |
| Second | sadkhZ.xls | Z |
| Third | asdasdPH.xls | PH |
| Fourth | adsjklahsdPH.xls | PH |
I'm trying to use regex expresions, but I'm not sure how to use them. I need to get a new column that "extracts" the category of the file; N if is a "normal" file (no category), Z if the file contains a "Z" just before the extension and PH if the file contains a "PH" before the extension.
I defined the following regex expresions that I think I could use, but I dont know how to use them:
regex_Z = re.compile('Z.xls$')
regex_PH = re.compile('PH.xls$')
PD: Could you recomend me any website to learn how to use the regex expresions?
CodePudding user response:
Let's try
df['Category'] = df['File'].str.extract('(Z|PH)\.xls$').fillna('N')
print(df)
ColumnA File Category
0 First aasdkh.xls N
1 Second sadkhZ.xls Z
2 Third asdasdPH.xls PH
3 Fourth adsjklahsd.xls N