Home > front end >  Searching a list of long strings for a substring and then printing out the next 4 characters
Searching a list of long strings for a substring and then printing out the next 4 characters

Time:11-09

thanks for reading my question. I will be working through this as we speak and will update the question if I find a solution. Although I fear this may be a bit too advanced for my skillset so I'd appreciate any help!

I have a list of strings, each displaying an error message.

'Error: Customer (ABC 111) has an activation error'

'Error: Customer (ABC 112) has an activation error'

For each string in this list of strings I'd like to find the substring 'ABC' and then print out the following four characters which are corresponding to the ID number.

OUT: ' 111', ' 112'

Now I know how to find a substring in a list of strings, but printing the following characters is confusing me.

I will update as I work through the code or until some coding legend helps me out!

Thanks!!

EDIT: Adding MRE and final code below:

Essentially the data was initially provided in an excel file with two headers which was converted to a dataframe in Pandas.

CONT_ID ERROR_DESC
123 Error: Customer (ABC 111) has an activation error
124 Error: Customer (ABC 112) has an activation error

etc.

I needed to iterate over the ERROR_DESC column to select the CUSTOMER_ID for each row. In the real world the data was a bit more complex, with different codes predating the ID and I also needed another substring from the strings. But for the MRE I will use ABC as the constant.

My final MRE code is below.


cust_id = []
for index, row in df.iterrows():
   desc = row['ERROR_DESC']
   
   i = desc.index('ABC')
   id_num = desc[i 4:1 7]
   cust_id.append(id_num)

CodePudding user response:

You can get the index of ABC and find it on the string:

a = 'Error: Customer (ABC 111) has an activation error'
i = a.index("ABC")
num = a[i 4:i 7] -> '111'

CodePudding user response:

Without a MRE to work from, I'll ask you modify this appropriately to fit your usecase:

import re

#setup
list_of_strings = ['Error: Customer (ABC 111) has an activation error',
                   'Error: Customer (ABC 112) has an activation error',
                  ]
pattern = r'(?<=ABC )(\d{3})'

#the thing you want
customer_ids = [int(cust_id.group(0)) for long_string in list_of_strings\
     if (cust_id:=re.search(pattern,long_string))]

#produces
print(customer_ids)

[Out]: [111, 112]
  • Related