Home > database >  Parse simple XML to pandas dataframe
Parse simple XML to pandas dataframe

Time:11-22

I hope you are well. I am looking to convert the following XML URL into a pandas dataframe.

You can view the XML here; https://clients2.google.com/complete/search?hl=en&output=toolbar&q=how garage doors

Here is the Python 3 code here, which currently returns an empty dataframe.

from bs4 import BeautifulSoup
import requests
import pandas as pd

response = requests.get('https://clients2.google.com/complete/search?hl=en&output=toolbar&q=how garage doors')

bs = BeautifulSoup(response.text, ['xml'])
print(bs)


obs = bs.find_all("CompleteSuggestion")

print(obs)

df = pd.DataFrame(columns=['suggestion data','Keyword'])

for node in obs:
    df = df.append({'suggestion data': node.get("suggestion data")}, ignore_index=True)
    
df.head()

Any suggestions would be welcome. I am open to do it with other modules if there are any better alternatives.

Also the expected output would be a dataframe containing a list of autosuggest search terms related to "garage doors".

I could not get Python ElementTree XML conversion to work.

CodePudding user response:

You need to get the attribute of suggestion tag, not the text/string inside the tag. Try this

df = pd.DataFrame(columns=['suggestion data','Keyword'])

for node in obs:
  for suggestion in node:
    df = df.append({'suggestion data': suggestion.attrs['data']}, ignore_index=True)
df.head()

CodePudding user response:

I always use ElementTree to parse an xml, this should work for you.

import xml.etree.ElementTree as ET
import pandas as pd

tree = ET.parse('YOUR_DATA.xml')
root = tree.getroot()

df = pd.DataFrame()
for child in root:
    for child2 in child:
        line = child2.attrib
        df = df.append(line, ignore_index=True)
   
  • Related