Home > Mobile >  Python: How to extract numbers and certain upercase letters after a keyword
Python: How to extract numbers and certain upercase letters after a keyword

Time:10-09

I'm trying to extract the digits after the word 'Amount' and the currency code after the digits into two separate columns using Python. Any help would be appreciated.

Successful refund. IBE payment ID 79104467 | Transaction-ref: 73462794 | Amount: 50.00 EUR

Successful refund by Hyperwallet. Transaction-ref: 48886217 | Amount: 214.64 USD | Hyperwallet payout id: 581082-2

CodePudding user response:

I would use regex for that

import re
def listAmounts(s):
    return [a for a,b in re.findall('(\d (\.\d )?\s[A-Z] )', s)]

(Returns any strings made of some digits, and an optional dot with some more digits, and a space and some uppercase letters. You can of course use some variant, allowing more spaces or no space before currency, or fixing the number of digits after dot, or allowing sign, etc.)

CodePudding user response:

Not the best solution, but should work

to_filter = 'Successful refund. IBE payment ID 79104467 | Transaction-ref: 73462794 | Amount: 50.00 EUR'
to_filter = to_filter.split(' ')
amount = [float(to_filter[to_filter.index('Amount:')   1]), to_filter[to_filter.index('Amount:')   2]]
print(amount)

CodePudding user response:

To construct a DataFrame from the given string try:

import re
import pandas as pd

s = """\
Successful refund. IBE payment ID 79104467 | Transaction-ref: 73462794 | Amount: 50.00 EUR
Successful refund by Hyperwallet. Transaction-ref: 48886217 | Amount: 214.64 USD | Hyperwallet payout id: 581082-2"""

df = pd.DataFrame(
    re.findall(r"Amount:\s*([\d.] )\s*([^\s] )", s),
    columns=["Amount", "Currency"],
)
print(df)

Prints:

   Amount Currency
0   50.00      EUR
1  214.64      USD
  • Related