I scrapped data with the help of beautifullsoap.And finally i have 2 list.One of them is header which length is 16 and other list is values length is 564.
How can i create a dataframe from theese lists?
from bs4 import BeautifulSoup
import requests
r = requests.get("http://tikili.az/")
soup = BeautifulSoup(r.content, "lxml")
houses = soup.find_all("li", {"class": "position-relativec"})
headers = []
values = []
for house in houses:
house_link = house.find_all("a")
link_head = "http://tikili.az/"
for link in house_link:
link_all = link_head link.get("href")
# print(link_all)
detail = requests.get(link_all)
# print(detail.status_code)
detail_soup = BeautifulSoup(detail.content, "lxml")
parameters = detail_soup.find_all("table", {"class": "elan-params-1"})
for detail in parameters:
tr = detail.find_all("tr")
for i in tr:
headers.append(i.find_all("td")[0].text)
for k in tr:
# print(k)
values.append(k.find_all("td")[1].text)
headers = list(dict.fromkeys(headers))
print(headers)
print("---------------")
print(values)
CodePudding user response:
df = pd.DataFrame({'headers':headers,'values':values})
CodePudding user response:
You can simply give pandas the two lists (header and values) if they have the correct format. The headers can simply be a 1D list, but the values need to be a list of rows. Therefore, I would split your list of values into a list of rows with the length of the header list. For example like this:
import pandas as pd
header = ['a', 'b', 'c', 'd']
values = ['a1', 'b1', 'c1', 'd1', 'a2', 'b2', 'c2', 'd2']
rows = [values[i:i len(header)] for i in range(0, len(values), len(header))]
df = pd.DataFrame(data=rows, columns=header)
print(df.to_string())
Output:
a b c d
0 a1 b1 c1 d1
1 a2 b2 c2 d2
If the list comprehension seems too complicated you can also use numpy to split the values into rows with this:
import numpy as np
rows = np.array_split(values, len(values)/len(header))