I am iterating through files in a directory. But I only need files with .csv extension. Then I need to use the path to those files to use them later in the code. Do determine if the file is .csv I use this:
for subdir in os.listdir(root):
for file in os.listdir(os.path.join(root,subdir)):
if file.endswith(ext):
print(file)
This gives me all the files with .csv extension. Now I want to create a string with the path to these files so I use this:
for subdir in os.listdir(root):
for file in os.listdir(os.path.join(root,subdir)):
if file.endswith(ext):
datoteka = root '\\' subdir '\\' file
The path to my files is now stored in string datoteka
and I want to use this inside this for loop. The one that also contains the if statement.
But I get an error that datoteka
is not defined. After a quick research I found out that I can not use variables that were defined inside an If statement outside of that If statement. Is there a way to pull the variable out?
I need to preform some data analysis on the files (datoteka
contains the path.):
for subdir in os.listdir(root):
for file in os.listdir(os.path.join(root,subdir)):
if file.endswith(ext):
datoteka = root '\\' subdir '\\' file
df = pd.read_csv(datoteka, encoding = 'cp1252')
This gives the following error:
Is there another way I could get my paths without defining datoteka
inside that If statemnet?
CodePudding user response:
Use glob and build a list of files like this:
from os.path import join
from glob import glob
from pandas import read_csv
ROOT = 'root' # root directory
SUBDIR = 'sub' # sub directory
list_of_csvs = [file for file in glob(join(ROOT, SUBDIR, '*.csv'))]
# now iterate over the list
for file in list_of_csvs:
df = read_csv(file)
Or, if you don't need to keep the list of files and want a recursive search then it's just:
from os.path import join
from glob import glob
from pandas import read_csv
ROOT = 'root' # root directory
for file in glob(join(ROOT, '**', '*.csv'), recursive=True):
df = read_csv(file)
CodePudding user response:
The variable "datoteka" is defined inside the if block so this can't be accessed outside the block, so define the variable outside the for loop.
Example :
for subdir in os.listdir(root):
datoteka = None
for file in os.listdir(os.path.join(root,subdir)):
if file.endswith(ext):
datoteka = root '\\' subdir '\\' file
if datoteka is not None:
df = pd.read_csv(datoteka, encoding = 'cp1252')