Home > OS >  List slicing vs indexing in Python?
List slicing vs indexing in Python?

Time:10-14

I was trying to process a file in Python. Long story short, here are two versions of the code I wrote:

for line in file:
    if line[0:2] == ".I":
        #do something
    elif line[0:2] == ".T":
        #do something else
    elif line[0:2] == ".A":
         ......

There were some 21000 lines in the file. However when I altered my code to just this:

for line in file:
        if line[0] == ".":
            if line[1] == "I":
                #do something                   
            elif line[1] == "T":
                #do something
            elif line[1] == "A":
                ...

the runtime changed dramatically, I mean from 40 mins, to 30 seconds. I know list slicing is O(N), but in this case we were only slicing the first two characters in the string. So what caused it to change this dramatically?

CodePudding user response:

Indexing is twice as fast as slicing, but this is a comparison of very small numbers. When run a million times, the difference is about .04 seconds. That's not the difference you see in your code.

>>> timeit("s[0:2]=='aa'", setup="s = '12345'")
0.08988943499571178
>>> timeit("s[0]=='a'", setup="s = '12345'")
0.05322081400663592
>>> timeit("val=='aa'", setup="val='aa'")
0.03722755100170616

You could speed up both cases a little by assigning the slice or index value to a variable once and using that for future comparisons. You could also do this in a function which is slightly faster referencing local variables.

Now to the bigger problem. Lets say you have 10,000 lines, and 1000 of them start with ".". And those lines are evenly distributed between ".A and .Z". You will check 23 different values on average. In the first case, thats 10000 * 23 or 230,000 total checks. In the second case, you eliminate most candidates with a single check, then the remaining with the average 23 checks. That's (9000) (1000 * 23) or 32,000 total checks. A 86% reduction in conditions checked.

Lets go further. Suppose you have ".whatever" values that you aren't interested in. Each one of these has to go through all 26 checks before you realize its a dud. if that's the case, you can group all of your comparators into a set and check that first.

wanted = {".A", ".B", etc...)
for line in file:
    check = line[:2]
    if check in wanted:
        val = check[1]
        if ...
        

You can go even further if you can write your "do_something" code as functions.

def do_thing_A():
    pass
    
def do_thing_B():
    pass
    
def do_nothing():
    pass
    
do_all_the_things = {".A":do_thing_A, ".B":do_thing_B}

for line in file:
    do_all_the_things.get(line[:2], do_nothing)()

CodePudding user response:

I'm looking more into the specifics of what is going on under the hood, but according to the Python Wiki, indexing has constant time-complexity (O(1)) whereas slicing has a complexity depending on the size of the slice, O(k).

  • Related