Extract value from column with pandas lib (data frame)-CodePudding

original data frame:

Date	Detail
31/03/22	I watch Netflix at home with my family 4 hours
01/04/22	I walk to the market for 3km and I spent 11.54 dollar
02/04/22	my dog bite me, I go to hospital, spend 29.99 dollar
03/04/22	I bought a game on steam 7 games spen 19.23 dollar

result data frame:

Date	Detail	Cost
31/03/22	I watch Netflix at home with my family 4 hours	0
01/04/22	I walk to the market for 3km and I spent 11.54 dollar	11.54
02/04/22	my dog bite me, I go to hospital, spend 29.99 dollar	29.99
03/04/22	I bought a game on steam 7 games spen 19.23 dollar	19.23

Describe my question:

If Detail Column does not contain specific string which is begin with sp.. and end with dollar then value in Cost col equal zero.
If Detail Column does contain specific string which is begin with sp.. and end with dollar, then value in Cost col equal value in the middle of specific string which is begin with sp.. and end with dollar.

I try to use regex but it's got first int that contain in the col like

| 01/04/22 | I walk to the market for 3km and I spent 11.54 dollar| 3 |

CodePudding user response：

You should be able to use a regex pattern of a form such as:

df['Cost'] = df['Detail'].str.extract(r'sp\D*([\d\.]*)\D*dollar')

This will look for the literal string sp and then any non-digit characters after it. The capture group (denoted by the ()) looks for any digits or period characters, representing the dollar amount. This is what is returned to the Cost column. The final part of the pattern allows any number of non-digit characters after the dollar amount, followed by the literal string dollar.

The pd.NA for rows which don't have a cost can then be replaced with 0:

df['Cost'] = df['Cost'].replace({pd.NA: 0})

If you want to make any enhancements I used this site to test the regex: https://regexr.com/6ir6o