all I'm new in Python, and using beautifoulsoup4 My XML is:
<?xml version="1.0" encoding="utf-8"?>
<database name="test_testdatabase">
<table name="products">
<column name="product_id"> x1x </column>
</table>
<table name="products_en_gb">
<column name="product_name"> Some name 1 </column >
<column name="product_s_desc"> Some short description 1 </column >
</table>
<table name="products">
<column name="product_id"> 2xx </column>
</table>
<table name="products_en_gb">
<column name="product_name"> Second product name 2 </column >
<column name="product_s_desc"> Second short description 2 </column >
</table>
</database>
And so in the same pattern I have more than 5000 products in XML
I would like append tag with name="product_id"
to table with name="products_en_gb"
but I would like follow pattern as it is.
So first id to first table, second id to second table and so on.
I try lot ways to do it. The most success I have with this code:
#test.py
product_id = soup.findAll(attrs={"name": ["product_id"]}):
for products_en_gb in soup.findAll(attrs={"name": ["products_en_gb"]}):
products_en_gb.contents.append(product_id[0])
The problem is that if i use product_id[0]
always append 1 tag but is the same first one in sequence for all tables, and if i use product_id
then all tags are append in all tables, my desired result is flowing:
<?xml version="1.0" encoding="utf-8"?>
<database name="test_testdatabase">
<table name="products">
<column name="product_id"> x1x </column>
</table>
<table name="products_en_gb">
<column name="product_id"> x1x </column>
<column name="product_name"> Some name 1 </column >
<column name="product_s_desc"> Some short description 1 </column >
</table>
<table name="products">
<column name="product_id"> 2xx </column>
</table>
<table name="products_en_gb">
<column name="product_id"> 2xx </column>
<column name="product_name"> Second product name 2 </column >
<column name="product_s_desc"> Second short description 2 </column >
</table>
</database>
I hope someone could help.
Thank you.
CodePudding user response:
Try:
from bs4 import BeautifulSoup
xml_doc = """\
<?xml version="1.0" encoding="utf-8"?>
<database name="test_testdatabase">
<table name="products">
<column name="product_id"> x1x </column>
</table>
<table name="products_en_gb">
<column name="product_name"> Some name 1 </column >
<column name="product_s_desc"> Some short description 1 </column >
</table>
<table name="products">
<column name="product_id"> 2xx </column>
</table>
<table name="products_en_gb">
<column name="product_name"> Second product name 2 </column >
<column name="product_s_desc"> Second short description 2 </column >
</table>
</database>"""
soup = BeautifulSoup(xml_doc, "xml")
for table in soup.select('table[name="products_en_gb"]'):
prev_products = table.find_previous("table", attrs={"name": "products"})
content = "\n".join(map(str, prev_products.contents)).strip()
table.insert(0, BeautifulSoup("\n" content, "html.parser"))
print(soup)
Prints:
<?xml version="1.0" encoding="utf-8"?>
<database name="test_testdatabase">
<table name="products">
<column name="product_id"> x1x </column>
</table>
<table name="products_en_gb">
<column name="product_id"> x1x </column>
<column name="product_name"> Some name 1 </column>
<column name="product_s_desc"> Some short description 1 </column>
</table>
<table name="products">
<column name="product_id"> 2xx </column>
</table>
<table name="products_en_gb">
<column name="product_id"> 2xx </column>
<column name="product_name"> Second product name 2 </column>
<column name="product_s_desc"> Second short description 2 </column>
</table>
</database>