I want to select and store string values(regex) into variable. Here I want to extract USD, 1000.00, 21/01/2022 from Name and store into separate variables for amount = 1000.0, date = 21/01/2022 and currency = USD respectively.
import pandas as pd
data_dict = {
'Name': ['I paid USD 1000.00 on 21/01/2022 for car service']
}
df = pd.DataFrame(data_dict)
CodePudding user response:
If the order is consistent and the currency always right before the amount, a simple regex with str.extract
should work:
df['Name'].str.extract(r'(?P<currency>[A-Z]{3})\s*(?P<amount>\d (?:\.\d )).*(?P<date>\d\d/\d\d/\d{4})')
output:
currency amount date
0 USD 1000.00 21/01/2022
Notes
- This won't validate the date (99/00/9876 would match)
- This assumes the currency is 3 uppercase letters (you can easily adapt)
- The type of all columns is string, if you plan to use the amount or date for operations you must convert them
variant: numbers with decimal separator
out = df['Name'].str.extract(r'(?P<currency>[A-Z]{3})\s*(?P<amount>\d (?:,\d{3})*(?:\.\d )).*(?P<date>\d\d/\d\d/\d{4})')
example:
currency amount date
0 USD 1000.00 21/01/2022
1 USD 100,000.00 21/01/2022