Home > Enterprise >  Reduce the number of list comprehensions and find a better way to open a list of files from a text f
Reduce the number of list comprehensions and find a better way to open a list of files from a text f

Time:07-29

I am trying to read multiple sql files in a repository. I have created the code for it, but it contains multiple list comprehensions. I need to know if there is a better (more pythonic) way of writing this code.

path = '/home/jupyter/SQL_scripts/'
file_list = open("/home/jupyter/a.txt", "r")
d= file_list.read()
file_list.close()

dl = d.split("\n")

def gsql(fpath):
    with open(fpath, 'r') as file:
        return file.read()

dl = [path   x for x in dl if isinstance(x, str)]
ss = [gsql(s) for s in dl]
[mod1.e_sql(x) for x in ss]

Here path is the directory in which the sql files are, and file_list is the text file which has just the file-names of the sql files.

Merging path and file_list gives the whole path to the file, and mod1.e_sql() is the custom module to execute the sql.

The code is required to do this:

  • execute all the sql file names in txt file
  • txt file only contains file names not whole path

Currently I am doing this:

  1. open txt file
  2. read all filenames as list elements
  3. join path each sql filenames
  4. open that (path each sql filenames)
  5. read the file content(sql query)
  6. execute that sql query string

I need to eliminate some of the steps, but I am not able to write anything better.

CodePudding user response:

Ignoring the irrelevant bits, we have:

dl = d.split("\n")

dl = [path   x for x in dl if isinstance(x, str)]
ss = [gsql(s) for s in dl]
df = [intf.execute_sql(f) for f in dl]

one confusing thing is we re-assign to dl

so we could:

dl = [path   x for x in d.split("\n") if isinstance(x, str)]
ss = [gsql(s) for s in dl]
df = [intf.execute_sql(f) for f in dl]

now it's clearer that we create dl and then iterate over it twice

list comprehensions are nice, but it would be more efficient to iterate over dl just once

so we can replace the last two comprehensions with a more verbose for loop:

dl = [path   x for x in d.split("\n") if isinstance(x, str)]

ss = []
df = []
for item in dl:
    ss.append(gsql(item))
    df.append(intf.execute_sql(item))

we could also re-arrange a bit more and get rid of the first comprehension (which is also kind of iterating over the same stuff an extra time)

dl = []
ss = []
df = []
for item in d.split("\n"):
    if isinstance(item, str):
        dl_val = path   item
        dl.append(dl_val)
        ss.append(gsql(dl_val))
        df.append(intf.execute_sql(dl_val))

CodePudding user response:

There are a lot of ways to simplify this. Since I don't know if all these intermediate variables have any purpose it's possible that I might be eliminating ones that you need.. but, here's a take on it:

path = '/home/jupyter/SQL_scripts/'

def gsql(fpath):
    with open(fpath, 'r') as file:
        return file.read()

with open("/home/jupyter/a.txt", "r") as file_list:
    filenames = file_list.read().split("\n")

# Eliminated your check for string type since I don't think it will ever be False.
for filename in filenames:
    mod1.e_sql(gsql(path   filename))

CodePudding user response:

Your code can be reduced to a single loop by iterating line by line over the file-object returned by open. Using with-statments will ensure the files are automatically closed after use. During iteration, the lines should be stripped to remove line-endings and eliminate blank lines. It's best to use os.path.join when concatenating paths, as (amongst other things) it will automatically insert the right separators where necessary.

There's no real need to use list-comprehensions in your code, and the way you're using them is unnecessarily inefficient since it iterates over the same data multiple times and needlessly creates intermediate lists.

In addition, I would say the re-write of your code given below is also more readable - which I suppose is always the most important aspect of whatever is considered to be "pythonic":

import os

dirpath = '/home/jupyter/SQL_scripts'

with open("/home/jupyter/a.txt") as file_list:
    for line in file_list:
        filename = line.strip()
        if filename:
            filepath = os.path.join(dirpath, filename)
            with open(filepath) as sql_file:
                mod1.e_sql(sql_file.read())
    
  • Related