Home > OS >  Pandas | read_csv | to_xml | ValueError: Invalid tag name 'foo bar'
Pandas | read_csv | to_xml | ValueError: Invalid tag name 'foo bar'

Time:10-14

Hi my fellow freinds from Stack Overflow,

I want to use python to convert csv to xml and it came to my ears that pandas could be very simple to manage this task.

Well it turns out its not so eazy.

What my code looks like:

import pandas as pd
import chardet
from pandas.core.frame import DataFrame

csvFile = '172431-82056.csv'
xmlFile = 'mySecondData.xml'  

def check_encoding(filename):
    """
    input: filename = "filename.csv"
    output: Dictionary = {'encoding': 'UTF-16', 'confidence': 1.0, 'language': ''}
    """
    result= {}
    with open(filename, 'rb') as rawdata:
        result = chardet.detect(rawdata.read(10000))
    return result

def import_csv(filename):
    """
    input: filename = "filename.csv"
    output: Dictionary = {'csv key': 'csv data', ... }
    """
    encoding = check_encoding(filename)['encoding']
    csv_data = pd.read_csv(filename, engine ='python', encoding=encoding, sep = None)
    #print(csv_data)
    return csv_data

#print(import_csv(csvFile))

def convert_to_xml(input_file, output_file):
    csv_data = import_csv(input_file)
    csv_data.to_xml(path_or_buffer=output_file, index = True, root_name='products',row_name='item', elem_cols=['post_title','regular_price'], prefix = 'g:', pretty_print=True)

convert_to_xml(csvFile, xmlFile)

What my output looks like:

Traceback (most recent call last):
  File "c:\Users\PavelH\Documents\Git\CSV Converter\csv_converter.py", line 53, in <module>
    convert_to_xml(csvFile, xmlFile)
  File "c:\Users\PavelH\Documents\Git\CSV Converter\csv_converter.py", line 51, in convert_to_xml
    df.to_xml(path_or_buffer=output_file, index = True, root_name='products',row_name='item', prefix = 'g:', pretty_print=True)
  File "C:\Users\PavelH\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\frame.py", line 2986, in to_xml
    return xml_formatter.write_output()
  File "C:\Users\PavelH\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\formats\xml.py", line 265, in write_output
    xml_doc = self.build_tree()
  File "C:\Users\PavelH\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\formats\xml.py", line 485, in build_tree
    self.build_elems()
  File "C:\Users\PavelH\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\formats\xml.py", line 575, in build_elems
    SubElement(self.elem_row, elem_name).text = val
  File "src\lxml\etree.pyx", line 3136, in lxml.etree.SubElement
  File "src\lxml\apihelpers.pxi", line 179, in lxml.etree._makeSubElement
  File "src\lxml\apihelpers.pxi", line 1734, in lxml.etree._tagValidOrRaise
ValueError: Invalid tag name 'foo bar'

Are tags with space invalid?

CodePudding user response:

I think your pandas is outdated. to_xml has been introduced in version 1.3.0. You can check your version with

# in python shell
import pandas
print(pandas.__version__)

If this is an older version than 1.3.0, you should upgrade pandas with

# in bash shell
pip install --upgrade pandas

CodePudding user response:

turns out that using

pip install --upgrade pandas

solved that issue XD, once again i am grateful for your help :)

I still have some other issues in the code :)

For example: ValueError: Invalid tag name 'Lieferzeit/Verfügbarkeit'

I needed to write extra function that cleaned all the tag names:

def clean_tags(dataframe):
    """
    Transforms dataframe tags with whitspaces to tags with underscores
    """
    dataframe.columns= dataframe.columns.str.replace(' ','_')
    dataframe.columns= dataframe.columns.str.replace('(','')
    dataframe.columns= dataframe.columns.str.replace(')','')
    dataframe.columns= dataframe.columns.str.replace('/','_')


    return dataframe
  • Related