Home > Software design >  Create dataframe from 2 list in python
Create dataframe from 2 list in python

Time:03-20

I scrapped data with the help of beautifullsoap.And finally i have 2 list.One of them is header which length is 16 and other list is values length is 564.

How can i create a dataframe from theese lists?


from bs4 import BeautifulSoup
import requests


r = requests.get("http://tikili.az/")
soup = BeautifulSoup(r.content, "lxml")

houses = soup.find_all("li", {"class": "position-relativec"})

headers = []
values = []

for house in houses:
    house_link = house.find_all("a")
    link_head = "http://tikili.az/"

    for link in house_link:
        link_all = link_head  link.get("href")

        # print(link_all)

        detail = requests.get(link_all)
        # print(detail.status_code)

        detail_soup = BeautifulSoup(detail.content, "lxml")

        parameters = detail_soup.find_all("table", {"class": "elan-params-1"})

        for detail in parameters:
            tr = detail.find_all("tr")

            for i in tr:
                headers.append(i.find_all("td")[0].text)

            for k in tr:
                # print(k)
                values.append(k.find_all("td")[1].text)


headers = list(dict.fromkeys(headers))
print(headers)
print("---------------")
print(values)

CodePudding user response:

df = pd.DataFrame({'headers':headers,'values':values})

CodePudding user response:

You can simply give pandas the two lists (header and values) if they have the correct format. The headers can simply be a 1D list, but the values need to be a list of rows. Therefore, I would split your list of values into a list of rows with the length of the header list. For example like this:

import pandas as pd

header = ['a', 'b', 'c', 'd']
values = ['a1', 'b1', 'c1', 'd1', 'a2', 'b2', 'c2', 'd2']

rows = [values[i:i   len(header)] for i in range(0, len(values), len(header))] 

df = pd.DataFrame(data=rows, columns=header)

print(df.to_string())

Output:

    a   b   c   d
0  a1  b1  c1  d1
1  a2  b2  c2  d2

If the list comprehension seems too complicated you can also use numpy to split the values into rows with this:

import numpy as np

rows = np.array_split(values, len(values)/len(header))
  • Related