Home > Enterprise >  Escaping XML Characters using Python Polars
Escaping XML Characters using Python Polars

Time:11-19

I'm working with Polars to build out XML from a table and I want to Escape XML characters. However, I'm running into issues when I try and do this. The first thing I did was try the following:

import polars as pl
from xml.sax.saxutils import escape

table_raw = pl.read_sql("""SELECT * FROM mytable""", engine).lazy()

table = table_raw.select([
    pl.concat_str([
    pl.lit('''<wd:Overall_XML_Tag>''').alias('Overall_XML_header'),

    pl
    .when(pl.col('value') != None).then(pl.format('''<wd:Value_XML_Tag>{}</wd:Value_XML_Tag>''', escape(pl.col('value'))))
    .otherwise('')
    .alias('value'),

    pl.lit('''</wd:Overall_XML_Tag>''') 
])
])

However, when doing this I get an error at my escape call of "'Expr' object has no attribute 'replace'"

I was able to get the following working by doing a .replace() of reserved characters but it is messy and cumbersome so hoping there is a better way to handle things.

import polars as pl
from xml.sax.saxutils import escape

table_raw = pl.read_sql("""SELECT * FROM mytable""", engine).lazy()

table = table_raw.select([
    pl.concat_str([
    pl.lit('''<wd:Overall_XML_Tag>''').alias('Overall_XML_header'),

    pl
    .when(pl.col('value') != None).then(pl.format('''<wd:Value_XML_Tag>{}</wd:Value_XML_Tag>''', pl.col('value').str.replace('&', '&amp;').str.replace('<', '&lt;').str.replace('>', '&gt;').str.replace("\"", "&quot;").str.replace("'", "&apos;"))))
    .otherwise('')
    .alias('value'),

    pl.lit('''</wd:Overall_XML_Tag>''') 
])
])

Anyone have a better way to handle this?

CodePudding user response:

Figured out a way to handle this. You can use a custom function like the following:

import polars as pl
from xml.sax.saxutils import escape

table_raw = pl.read_sql("""SELECT * FROM mytable""", engine).lazy()

table = table_raw.select([
    pl.concat_str([
    pl.lit('''<wd:Overall_XML_Tag>''').alias('Overall_XML_header'),

    pl
    .when(pl.col('value') != None).then(pl.format('''<wd:Value_XML_Tag>{}</wd:Value_XML_Tag>''', pl.col('value').apply(lambda x: escape(x))))
    .otherwise('')
    .alias('value'),

    pl.lit('''</wd:Overall_XML_Tag>''') 
])
])
  • Related