I'm working on a directory with 27 folders and various XML files inside each folder.
was able to:
- parse one XML file and write to a CSV file
- traverse one folder and read and parse all XML files in it
challenges:
- having a problem trying to go through and parse all the XML files from the root folder to its subfolders
Send help and thank you. Code snippets below
# working in one folder only
import csv
import xml.etree.ElementTree as ET
import os
## directory
path = "/Users.../y"
filenames = []
## Count the number of xml files of each folder
files = os.listdir(path)
print("\n")
xml_data_to_csv = open('/Users.../xml_extract.csv', 'w')
list_head = []
csvwriter = csv.writer(xml_data_to_csv)
# Read XML files in a folder
for filename in os.listdir(path):
if not filename.endswith('.xml'):
continue
fullname = os.path.join(path,filename)
print("\n", fullname)
filenames.append(fullname)
# parse elements in each XML file
for filename in filenames:
tree = ET.parse(filename)
root = tree.getroot()
extract_xml=[]
## extract child elements per xml file
print("\n")
for x in root.iter('Info'):
for element in x:
print(element.tag,element.text)
extract_xml.append(element.text)
## Write list nodes to csv
csvwriter.writerow(extract_xml)
## Close CSV file
xml_data_to_csv.close()
CodePudding user response:
You can use os.walk:
import os
for dir_name, dirs, files in os.walk('<root_dir>'):
# parse files
CodePudding user response:
You can get the list of all the XML files in a given path with
import os
path = "main/root"
filelist = []
for root, dirs, files in os.walk(path):
for file in files:
if not file.endswith('.xml'):
continue
filelist.append(os.path.join(root, file))
for file in filelist:
print(file)
# or in your case parse the XML 'file'
If for instance:
$ tree /main/root
/main/root
├── a
│ ├── a.xml
│ ├── b.xml
│ └── c.xml
├── b
│ ├── d.xml
│ ├── e.xml
│ └── x.txt
└── c
├── f.xml
└── g.xml
we get:
/main/root/c/g.xml
/main/root/c/f.xml
/main/root/b/e.xml
/main/root/b/d.xml
/main/root/a/c.xml
/main/root/a/b.xml
/main/root/a/a.xml
If you want to sort directories and files:
for root, dirs, files in os.walk(path):
dirs.sort()
for file in sorted(files):
if not file.endswith('.xml'):
continue
filelist.append(os.path.join(root, file))
CodePudding user response:
You can use the pathlib
module to "glob" the XML files. It will search all subdirectories for the pattern you supply and return Path
objects that already include the path to the file. Cleaning up your script a bit, you would have
import csv
import xml.etree.ElementTree as ET
from pathlib import Path
## directory
path = Path("/Users.../y")
with open('/Users.../xml_extract.csv', 'w') as xml_data_to_csv:
csvwriter = csv.writer(xml_data_to_csv)
# Read XML files in a folder
for filepath in path.glob("**/*.xml"):
tree = ET.parse(filename)
root = tree.getroot()
extract_xml=[]
## extract child elements per xml file
print("\n")
for x in root.iter('Info'):
for element in x:
print(element.tag,element.text)
extract_xml.append(element.text)
## Write list nodes to csv
csvwriter.writerow(extract_xml)