Home > database >  Extract numbers only from the strings in which a keyword is mentioned
Extract numbers only from the strings in which a keyword is mentioned

Time:06-18

I have 4 elements stored in an array, I want to get the integers only from the strings in which Approved key word is mentioned.

my_array = ['STK72184 4/28/2022 50 from Exchange Balance, 50 from Earning Balance & 10 from Bonus 25000 Regular 10/20/2023 Approved 4/28/2022',
            'STK725721 4/27/2022 50 from Exchange Balance, 40 from Earning Balance & 10 from Bonus Balance 5000 Regular 10/19/2023 Closed 4/27/2022',
            'STK725721 4/27/2022 50 from Exchange Balance, 40 from Earning Balance & 10 from Bonus Balance 15000 Regular 10/19/2023 Closed 4/27/2022',
            'STK722222 4/26/2022 50 from Exchange Balance, 40 from Earning Balance & 10 from Bonus Balance 10000 Regular 10/18/2023 Approved 4/26/2022']

Till now I can only get the integers from the list by doing this:

import re

# Print integers
nums = [int(re.search(r'\d 000', s)[0]) for s in my_array]
print(nums)

# Printed output:
# [25000, 5000, 15000, 10000]

Expected output is:

[25000,10000]

CodePudding user response:

my_array = ['STK72184 4/28/2022 50 from Exchange Balance, 50 from Earning Balance & 10 from Bonus 25000 Regular 10/20/2023 Approved 4/28/2022',
            'STK725721 4/27/2022 50 from Exchange Balance, 40 from Earning Balance & 10 from Bonus Balance 5000 Regular 10/19/2023 Closed 4/27/2022',
            'STK725721 4/27/2022 50 from Exchange Balance, 40 from Earning Balance & 10 from Bonus Balance 15000 Regular 10/19/2023 Closed 4/27/2022',
            'STK722222 4/26/2022 50 from Exchange Balance, 40 from Earning Balance & 10 from Bonus Balance 10000 Regular 10/18/2023 Approved 4/26/2022']

import re

# Print integers
nums = [int(re.search(r'\d 000', s)[0]) for s in my_array if 'Approved' in s]
print(nums) # [25000, 10000]

CodePudding user response:

You can filter for the desired numbers without re. Simply use next() function to search for the desired number. This assumed that there is an integer that ends with "000"s in each element in my_array (this assumption is also made in your original code, so I assume it's OK).

# search a string that ends with '000' among the words in each element that includes "Approved" in my_array
[int(next(i for i in s.split() if i[-3:]=='000')) for s in my_array if 'Approved' in s]
# [25000, 10000]

CodePudding user response:

Use list comprehension with re.search and an if. Note that the second example shows that regex-based search can be quite powerful in pulling out just the patterns you want, thus I almost always prefer it to exact string match (except when performance is critical). Also, I renamed array to lst (this data structure is called list in Python, and array is some other languages).

import re

my_lst = ['STK72184 4/28/2022 50 from Exchange Balance, 50 from Earning Balance & 10 from Bonus 25000 Regular 10/20/2023 Approved 4/28/2022',
          'STK725721 4/27/2022 50 from Exchange Balance, 40 from Earning Balance & 10 from Bonus Balance 5000 Regular 10/19/2023 Closed 4/27/2022',
          'STK725721 4/27/2022 50 from Exchange Balance, 40 from Earning Balance & 10 from Bonus Balance 15000 Regular 10/19/2023 Closed 4/27/2022',
          'STK722222 4/26/2022 50 from Exchange Balance, 40 from Earning Balance & 10 from Bonus Balance 10000 Regular 10/18/2023 Approved 4/26/2022']

nums = [int(re.search(r'\d 000', s)[0]) for s in my_lst if re.search(r'Approved', s)]
print(nums)
# [25000, 10000]

nums = [int(re.search(r'\d 000', s)[0]) for s in my_lst if re.search(r'4/2[67]', s)]
print(nums)
# [5000, 15000, 10000]
  • Related