Hi I'm fairly new to Python and needed help with extracting strings from a list. I am using Python on Visual Studios.
I have hundreds of similar strings and I need to extract specific information so I can add it to a table in columns - the aim is to automate this task using python. I would like to extract the data between the headers 'Names', 'Ages' and 'Jobs'. The issue I am facing is that the number of entries of names, ages and jobs varies a lot within all the lists and so I would like to write unique code which could apply to all the lists.
list_x = ['Names','Ashley','Lee','Poonam','Ages', '25', '35', '42' 'Jobs', 'Doctor', 'Teacher', 'Nurse']
I am struggling to extract
['Ashley', 'Lee', 'Poonam']
I have tried the following:
for x in list_x:
if x == 'Names':
for y in list_x:
if y == 'Ages':
print(list_x[x:y])
This however comes up with the following error: "Exception has occurred: typeError X
slice indices must be integers or None or have an index method"
Is there a way of doing this without specifying exact indices?
CodePudding user response:
As the comment suggested editing the data is the easiest way to go, but if you have to...
newList = oldList[oldList.index('Names') 1:oldList.index("Ages")]
It just finds the indices of "Names"
and "Ages"
in the list, and extracts the bit between.
Lots can (and will) go wrong with this method though - if there's a name which is "Names", or if they are misspelt, etc.
CodePudding user response:
For completeness sake, it might be not a bad idea to use an approach similar to the below.
First, build a list of indices of each of the desired headers:
list_x = ['Names', 'Ashley', 'Lee', 'Poonam', 'Ages', '25', '35', '42', 'Jobs', 'Doctor', 'Teacher', 'Nurse']
headers = ('Names', 'Ages', 'Jobs')
header_indices = [list_x.index(header) for header in headers]
print('indices:', header_indices) # [0, 4, 8]
Then, create a list of values for each header, which we can infer from the positions where each header shows up in the list:
values = {}
for i in range(len(header_indices)):
header = headers[i]
start = header_indices[i] 1
try:
values[header] = list_x[start:header_indices[i 1]]
except IndexError:
values[header] = list_x[start:]
And finally, we can display it for debugging purposes:
print('values:', values)
# {'Names': ['Ashley', 'Lee', 'Poonam'], 'Ages': ['25', '35', '42'], 'Jobs': ['Doctor', 'Teacher', 'Nurse']}
assert values['Names'] == ['Ashley', 'Lee', 'Poonam']
For better time complexity O(N)
, we can alternatively use an approach like below so that we only have one for
loop over the list to build a dict
object with the values:
from collections import defaultdict
values = defaultdict(list)
header_idx = -1
for x in list_x:
if x in headers:
header_idx = 1
else:
values[headers[header_idx]].append(x)
print('values:', values)
# defaultdict(<class 'list'>, {'Names': ['Ashley', 'Lee', 'Poonam'], 'Ages': ['25', '35', '42'], 'Jobs': ['Doctor', 'Teacher', 'Nurse']})