I hope you are well. I am looking to convert the following XML URL into a pandas dataframe.
You can view the XML here; https://clients2.google.com/complete/search?hl=en&output=toolbar&q=how garage doors
Here is the Python 3 code here, which currently returns an empty dataframe.
from bs4 import BeautifulSoup
import requests
import pandas as pd
response = requests.get('https://clients2.google.com/complete/search?hl=en&output=toolbar&q=how garage doors')
bs = BeautifulSoup(response.text, ['xml'])
print(bs)
obs = bs.find_all("CompleteSuggestion")
print(obs)
df = pd.DataFrame(columns=['suggestion data','Keyword'])
for node in obs:
df = df.append({'suggestion data': node.get("suggestion data")}, ignore_index=True)
df.head()
Any suggestions would be welcome. I am open to do it with other modules if there are any better alternatives.
Also the expected output would be a dataframe containing a list of autosuggest search terms related to "garage doors".
I could not get Python ElementTree XML conversion to work.
CodePudding user response:
You need to get the attribute of suggestion
tag, not the text/string inside the tag. Try this
df = pd.DataFrame(columns=['suggestion data','Keyword'])
for node in obs:
for suggestion in node:
df = df.append({'suggestion data': suggestion.attrs['data']}, ignore_index=True)
df.head()
CodePudding user response:
I always use ElementTree to parse an xml, this should work for you.
import xml.etree.ElementTree as ET
import pandas as pd
tree = ET.parse('YOUR_DATA.xml')
root = tree.getroot()
df = pd.DataFrame()
for child in root:
for child2 in child:
line = child2.attrib
df = df.append(line, ignore_index=True)