Home > Software design >  How to extract all text between certain characters with Python re
How to extract all text between certain characters with Python re

Time:09-21

I'm trying to extract all text between certain characters but my current code simply returns an empty list. Each row has a long text string that looks like this:

"[{'index': 0, 'spent_transaction_hash': '4b3e9741022d4', 'spent_output_index': 68, 'script_asm': '3045022100e9e2280f5e6d965ced44', 'value': Decimal('381094.000000000')}\n {'index': 1, 'spent_transaction_hash': '0cfbd8591a3423', 'spent_output_index': 2, 'script_asm': '3045022100a', 'value': Decimal('3790496.000000000')}]"

I just need the values for "spent_transaction_hash". For example, I'd like to create a new column that has a list of ['4b3e9741022d4', '0cfbd8591a3423']. I'm trying to extract the values between 'spent_transaction_hash': and the comma. Here's my current code:

my_list = []

for row in df['column']:
    value = re.findall(r'''spent_transaction_hash'\: \(\[\'(.*?)\'\]''', row)
    my_list.append(value)

This code simply returns a blank list. Could anyone please tell me which part of my code is wrong?

CodePudding user response:

Is is what you're looking for? 'spent_transaction_hash'\: '([a-z0-9] )'

Test: https://regex101.com/r/cnviyS/1

CodePudding user response:

Since it looks like you already have a list of Python dict objects, but in string format, why not just eval it and grab the desired keys? of course with that approach you don't need the regex matching, which is what the question had asked.

from decimal import Decimal

v = """\
[{'index': 0, 'spent_transaction_hash': '4b3e9741022d4', 'spent_output_index': 68, 'script_asm': '3045022100e9e2280f5e6d965ced44', 'value': Decimal('381094.000000000')}\n {'index': 1, 'spent_transaction_hash': '0cfbd8591a3423', 'spent_output_index': 2, 'script_asm': '3045022100a', 'value': Decimal('3790496.000000000')}]\
"""

L = eval(v.replace('\n', ','))
hashes = [e['spent_transaction_hash'] for e in L]

print(hashes)
# ['4b3e9741022d4', '0cfbd8591a3423']
  • Related