Home > Software design >  Is there a maximum limit to a python array? How to handle large data?
Is there a maximum limit to a python array? How to handle large data?

Time:02-12

I'm using a simple python array to store words fetched from a file.

words=[]
words.append(new_word)

This code snippet works perfectly for files with small word counts. However when running the script for larger files, it hangs after some time.(when the array length is around 111166 and the letter count inside the array is high)

Is there a maximum limit for a python array? Is there a workaround to for this?

Thanks in advance.

CodePudding user response:

You may consider using a database if your data gets too big. a viable option is SQLite which is a simple file-based database.

First create a table for your words

try:
    connection = sqlite3.connect("database.db")
    cursor = connection.cursor()
    cursor.execute('''
        CREATE TABLE "words" (
            "id"    INTEGER,
            "word"  TEXT NOT NULL UNIQUE,
            PRIMARY KEY("id" AUTOINCREMENT)
        );
    ''')
except sqlite3.Error as error:
    print("Failed to execute the above query", error)
finally:
    if connection:
        connection.close()
    

Now you can start adding words to the table

my_word = cat

try:
    connection = sqlite3.connect("database.db")
    cursor = connection.cursor()
    cursor.execute("INSERT INTO words(word) VALUES(?)", [my_word])
except sqlite3.Error as error:
    print("Failed to execute the above query", error)
finally:
    if connection:
        connection.close()

Now to fetch the word from list do

search_word = "cat"

try:
    connection = sqlite3.connect("database.db")
    cursor = connection.cursor()
    cursor.execute("SELECT * FROM words WHERE word=?", [search_word])
    print(cursor.fetchall())
except sqlite3.Error as error:
    print("Failed to execute the above query", error)
finally:
    if connection:
        connection.close()

CodePudding user response:

sys.maxsize is the maximum indices that lists can have:

An integer giving the maximum value a variable of type Py_ssize_t can take. It’s usually 2**31 - 1 on a 32-bit platform and 2**63 - 1 on a 64-bit platform.

But apparently this shouldn't be your problem. There is something else going on with your code. sys.maxsize is much bigger than 111166.

.append() is also O(1) which doesn't slow your code. But when lists become larger than the place they have, new location in memory is allocated for them. This happens rarely.

  • Related