Extracting Strings From a List-CodePudding

Hi I'm fairly new to Python and needed help with extracting strings from a list. I am using Python on Visual Studios.

I have hundreds of similar strings and I need to extract specific information so I can add it to a table in columns - the aim is to automate this task using python. I would like to extract the data between the headers 'Names', 'Ages' and 'Jobs'. The issue I am facing is that the number of entries of names, ages and jobs varies a lot within all the lists and so I would like to write unique code which could apply to all the lists.

list_x = ['Names','Ashley','Lee','Poonam','Ages', '25', '35', '42' 'Jobs', 'Doctor', 'Teacher', 'Nurse']

I am struggling to extract

['Ashley', 'Lee', 'Poonam']

I have tried the following:

for x in list_x:
      if x == 'Names':
           for y in list_x:
                 if y == 'Ages':
                      print(list_x[x:y])

This however comes up with the following error: "Exception has occurred: typeError X

slice indices must be integers or None or have an index method"

Is there a way of doing this without specifying exact indices?

CodePudding user response：

As the comment suggested editing the data is the easiest way to go, but if you have to...

newList = oldList[oldList.index('Names')   1:oldList.index("Ages")]

It just finds the indices of "Names" and "Ages" in the list, and extracts the bit between.

Lots can (and will) go wrong with this method though - if there's a name which is "Names", or if they are misspelt, etc.

CodePudding user response：

For completeness sake, it might be not a bad idea to use an approach similar to the below.

First, build a list of indices of each of the desired headers:

list_x = ['Names', 'Ashley', 'Lee', 'Poonam', 'Ages', '25', '35', '42', 'Jobs', 'Doctor', 'Teacher', 'Nurse']
headers = ('Names', 'Ages', 'Jobs')

header_indices = [list_x.index(header) for header in headers]
print('indices:', header_indices)  # [0, 4, 8]

Then, create a list of values for each header, which we can infer from the positions where each header shows up in the list:

values = {}
for i in range(len(header_indices)):
    header = headers[i]
    start = header_indices[i]   1
    try:
        values[header] = list_x[start:header_indices[i   1]]
    except IndexError:
        values[header] = list_x[start:]

And finally, we can display it for debugging purposes:

print('values:', values)
# {'Names': ['Ashley', 'Lee', 'Poonam'], 'Ages': ['25', '35', '42'], 'Jobs': ['Doctor', 'Teacher', 'Nurse']}

assert values['Names'] == ['Ashley', 'Lee', 'Poonam']

For better time complexity O(N), we can alternatively use an approach like below so that we only have one for loop over the list to build a dict object with the values:

from collections import defaultdict

values = defaultdict(list)
header_idx = -1

for x in list_x:
    if x in headers:
        header_idx  = 1
    else:
        values[headers[header_idx]].append(x)

print('values:', values)
# defaultdict(<class 'list'>, {'Names': ['Ashley', 'Lee', 'Poonam'], 'Ages': ['25', '35', '42'], 'Jobs': ['Doctor', 'Teacher', 'Nurse']})