I'm working with Polars to build out XML from a table and I want to Escape XML characters. However, I'm running into issues when I try and do this. The first thing I did was try the following:
import polars as pl
from xml.sax.saxutils import escape
table_raw = pl.read_sql("""SELECT * FROM mytable""", engine).lazy()
table = table_raw.select([
pl.concat_str([
pl.lit('''<wd:Overall_XML_Tag>''').alias('Overall_XML_header'),
pl
.when(pl.col('value') != None).then(pl.format('''<wd:Value_XML_Tag>{}</wd:Value_XML_Tag>''', escape(pl.col('value'))))
.otherwise('')
.alias('value'),
pl.lit('''</wd:Overall_XML_Tag>''')
])
])
However, when doing this I get an error at my escape call of "'Expr' object has no attribute 'replace'"
I was able to get the following working by doing a .replace() of reserved characters but it is messy and cumbersome so hoping there is a better way to handle things.
import polars as pl
from xml.sax.saxutils import escape
table_raw = pl.read_sql("""SELECT * FROM mytable""", engine).lazy()
table = table_raw.select([
pl.concat_str([
pl.lit('''<wd:Overall_XML_Tag>''').alias('Overall_XML_header'),
pl
.when(pl.col('value') != None).then(pl.format('''<wd:Value_XML_Tag>{}</wd:Value_XML_Tag>''', pl.col('value').str.replace('&', '&').str.replace('<', '<').str.replace('>', '>').str.replace("\"", """).str.replace("'", "'"))))
.otherwise('')
.alias('value'),
pl.lit('''</wd:Overall_XML_Tag>''')
])
])
Anyone have a better way to handle this?
CodePudding user response:
Figured out a way to handle this. You can use a custom function like the following:
import polars as pl
from xml.sax.saxutils import escape
table_raw = pl.read_sql("""SELECT * FROM mytable""", engine).lazy()
table = table_raw.select([
pl.concat_str([
pl.lit('''<wd:Overall_XML_Tag>''').alias('Overall_XML_header'),
pl
.when(pl.col('value') != None).then(pl.format('''<wd:Value_XML_Tag>{}</wd:Value_XML_Tag>''', pl.col('value').apply(lambda x: escape(x))))
.otherwise('')
.alias('value'),
pl.lit('''</wd:Overall_XML_Tag>''')
])
])