Home > Enterprise >  How to iterate through sequential string values and append to a nested list
How to iterate through sequential string values and append to a nested list

Time:06-28

I have a list containing filenames of a dataset, in the form of a number followed by some descriptive text (which is different for each file):

a = ['001_sometext', '002_sometext', ..., '162_sometext', '001_sometext', ..., '162_sometext]

The list cycles from '001' to '162' multiple times, but the list also doesn't follow a perfect sequence, some numbers are missing.

My intention is to read all files containing '001' and append them to another list, and then do the same for '002' and so on, such that I end up with a nested list containing a separate list for each number in the sequence.

My current attempt:

phrases = []
xi = []
for digits in range(0, 162):
    for x in a:
        if str(digits) in x:
            xi.append(x)
    phrases.append(xi)

However, this gives me a nested list of the entire list over and over again, rather than a list for each number.

Edit:

The loop above is reading all files containing just a '0', which brings back hundreds of files and adds them to a list. A minor fix is that I've made a loop for each order of magnitude:

phrases = []
for digits in range(1, 10):
    xi = []
    for x in a:
        if '00'   str(digits) in x:
            xi.append(x)
        else: None
    phrases.append(xi)

and

phrases = []
for digits in range(10, 100):
    xi = []
    for x in a:
        if '0'   str(digits) in x:
            xi.append(x)
        else: None
    phrases.append(xi)

and

phrases = []
for digits in range(100, 162):
    xi = []
    for x in a:
        if str(digits) in x:
            xi.append(x)
        else: None
    phrases.append(xi)

CodePudding user response:

You have a few issues with your code, firstly you need to clear xi on each loop; then you need to iterate in the range 1 to 163 (i.e. 1 to 162 inclusive) and finally you can't use str(digits) in x because (for example) str(1) would match against 001, 015, 102 etc.

Something like this should work:

for digits in range(1, 163):
    xi = []
    srch = f'{digits:03d}'
    for x in a:
        if x.startswith(srch):
            xi.append(x)
    phrases.append(xi)

Alternatively you could use a nested list comprehension:

phrases = [ [f for f in a if f.startswith(f'{n:03d}')] for n in range(1, 163)]

If

a = ['001_sometext', '002_sometext', '162_sometext', '001_someothertext', '162_someothertext']

both of these give a result of:

[['001_sometext', '001_someothertext'], ['002_sometext'], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], ['162_sometext', '162_someothertext']]
  • Related