Home > front end >  What is the easiest way to append data to Pandas DataFrame?
What is the easiest way to append data to Pandas DataFrame?

Time:06-08

I am trying to append scraped data to a dataframe:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from bs4 import BeautifulSoup
import requests
import csv
url="https://en.wikipedia.org/wiki/List_of_German_football_champions"
page=requests.get(url).content
soup=BeautifulSoup(page,"html.parser")

seasons=[]
first_places=[]
runner_ups=[]
third_places=[]
top_scorrers=[]

tbody=soup.find_all("tbody")[7]
trs=tbody.find_all("tr")
for tr in trs:
    season = tr.find_all("a")[0].text
    first_place = tr.find_all("a")[1].text
    runner_up = tr.find_all("a")[2].text
    third_place = tr.find_all("a")[3].text
    top_scorer = tr.find_all("a")[4].text
    seasons.append(season)
    first_places.append(first_place)
    runner_ups.append(runner_up)
    third_places.append(third_place)
    top_scorrers.append(top_scorer)

tuples=list(zip(seasons,first_places,runner_ups,third_places,top_scorrers))
df=pd.DataFrame(tuples,columns=["Season","FirstPlace","RunnerUp","ThirdPlace","TopScorrer"])
df

enter image description here

Is there an easier way to append data directly to an empty dataframe without creating lists and then zipping them?

CodePudding user response:

The easiest way is to use pd.read_html. If you want to get the table from "Bundesliga (since 1963)", extract the 8th table:

# Extract all <table> from url
dfs = pd.read_html(url)

# Bundeliga (since 1963)
df = dfs[7]

# Cleanup your dataframe
df.columns = df.columns.str.split('[').str[0]
df['Champions'] = df['Champions'].str.extract(r'([^(] )')

Output

>>> df
       Season                  Champions                Runners-up               Third place                       Top scorer(s)  Goals
0     1963–64                1. FC Köln             Meidericher SV       Eintracht Frankfurt                          Uwe Seeler     30
1     1964–65             Werder Bremen                 1. FC Köln         Borussia Dortmund                   Rudi Brunnenmeier     24
2     1965–66           TSV 1860 Munich          Borussia Dortmund             Bayern Munich                 Friedhelm Konietzka     26
3     1966–67    Eintracht Braunschweig            TSV 1860 Munich         Borussia Dortmund        Lothar Emmerich, Gerd Müller     28
4     1967–68            1. FC Nürnberg              Werder Bremen  Borussia Mönchengladbach                         Hannes Löhr     27
5     1968–69             Bayern Munich           Alemannia Aachen  Borussia Mönchengladbach                         Gerd Müller     30
6     1969–70  Borussia Mönchengladbach              Bayern Munich                Hertha BSC                         Gerd Müller     38
7     1970–71  Borussia Mönchengladbach              Bayern Munich                Hertha BSC                      Lothar Kobluhn     24
8     1971–72             Bayern Munich                 Schalke 04  Borussia Mönchengladbach                         Gerd Müller     40
9     1972–73             Bayern Munich                 1. FC Köln        Fortuna Düsseldorf                         Gerd Müller     36
10    1973–74             Bayern Munich   Borussia Mönchengladbach        Fortuna Düsseldorf         Josef Heynckes, Gerd Müller     30
11    1974–75  Borussia Mönchengladbach                 Hertha BSC              Hamburger SV                      Josef Heynckes     27
12    1975–76  Borussia Mönchengladbach               Hamburger SV             Bayern Munich                       Klaus Fischer     29
13    1976–77  Borussia Mönchengladbach                 Schalke 04    Eintracht Braunschweig                       Dieter Müller     34
14    1977–78                1. FC Köln   Borussia Mönchengladbach                Hertha BSC          Dieter Müller, Gerd Müller     24
15    1978–79              Hamburger SV              VfB Stuttgart      1. FC Kaiserslautern                        Klaus Allofs     22
16    1979–80             Bayern Munich               Hamburger SV             VfB Stuttgart               Karl-Heinz Rummenigge     26
17    1980–81             Bayern Munich               Hamburger SV             VfB Stuttgart               Karl-Heinz Rummenigge     29
18    1981–82              Hamburger SV                 1. FC Köln             Bayern Munich                      Horst Hrubesch     27
19    1982–83              Hamburger SV              Werder Bremen             VfB Stuttgart                         Rudi Völler     23
20    1983–84             VfB Stuttgart               Hamburger SV  Borussia Mönchengladbach               Karl-Heinz Rummenigge     26
21    1984–85             Bayern Munich              Werder Bremen                1. FC Köln                        Klaus Allofs     26
22    1985–86             Bayern Munich              Werder Bremen           Bayer Uerdingen                        Stefan Kuntz     22
23    1986–87             Bayern Munich               Hamburger SV  Borussia Mönchengladbach                            Uwe Rahn     24
24    1987–88             Werder Bremen              Bayern Munich                1. FC Köln                    Jürgen Klinsmann     19
25    1988–89             Bayern Munich                 1. FC Köln             Werder Bremen     Thomas Allofs, Roland Wohlfarth     17
26    1989–90             Bayern Munich                 1. FC Köln       Eintracht Frankfurt                       Jørn Andersen     18
27    1990–91      1. FC Kaiserslautern              Bayern Munich             Werder Bremen                    Roland Wohlfarth     21
28    1991–92             VfB Stuttgart          Borussia Dortmund       Eintracht Frankfurt                        Fritz Walter     22
29    1992–93             Werder Bremen              Bayern Munich       Eintracht Frankfurt         Ulf Kirsten, Anthony Yeboah     20
30    1993–94             Bayern Munich       1. FC Kaiserslautern          Bayer Leverkusen        Stefan Kuntz, Anthony Yeboah     18
31    1994–95         Borussia Dortmund              Werder Bremen                  Freiburg        Heiko Herrlich, Mario Basler     20
32    1995–96         Borussia Dortmund              Bayern Munich                Schalke 04                         Fredi Bobic     17
33    1996–97             Bayern Munich           Bayer Leverkusen         Borussia Dortmund                         Ulf Kirsten     22
34    1997–98      1. FC Kaiserslautern              Bayern Munich          Bayer Leverkusen                         Ulf Kirsten     22
35    1998–99             Bayern Munich           Bayer Leverkusen                Hertha BSC                      Michael Preetz     23
36  1999–2000             Bayern Munich           Bayer Leverkusen              Hamburger SV                          Martin Max     19
37    2000–01             Bayern Munich                 Schalke 04         Borussia Dortmund          Sergej Barbarez, Ebbe Sand     22
38    2001–02         Borussia Dortmund           Bayer Leverkusen             Bayern Munich          Márcio Amoroso, Martin Max     18
39    2002–03             Bayern Munich              VfB Stuttgart         Borussia Dortmund  Giovane Élber, Thomas Christiansen     21
40    2003–04             Werder Bremen              Bayern Munich          Bayer Leverkusen                              Aílton     28
41    2004–05             Bayern Munich                 Schalke 04             Werder Bremen                        Marek Mintál     24
42    2005–06             Bayern Munich              Werder Bremen              Hamburger SV                      Miroslav Klose     25
43    2006–07             VfB Stuttgart                 Schalke 04             Werder Bremen                     Theofanis Gekas     20
44    2007–08             Bayern Munich              Werder Bremen                Schalke 04                           Luca Toni     24
45    2008–09             VfL Wolfsburg              Bayern Munich             VfB Stuttgart                             Grafite     28
46    2009–10             Bayern Munich                 Schalke 04             Werder Bremen                          Edin Džeko     22
47    2010–11         Borussia Dortmund           Bayer Leverkusen             Bayern Munich                         Mario Gómez     28
48    2011–12         Borussia Dortmund              Bayern Munich                Schalke 04                 Klaas-Jan Huntelaar     29
49    2012–13             Bayern Munich          Borussia Dortmund          Bayer Leverkusen                     Stefan Kießling     25
50    2013–14             Bayern Munich          Borussia Dortmund                Schalke 04                  Robert Lewandowski     20
51    2014–15             Bayern Munich              VfL Wolfsburg  Borussia Mönchengladbach                     Alexander Meier     19
52    2015–16             Bayern Munich          Borussia Dortmund          Bayer Leverkusen                  Robert Lewandowski     30
53    2016–17             Bayern Munich                 RB Leipzig         Borussia Dortmund           Pierre-Emerick Aubameyang     31
54    2017–18             Bayern Munich                 Schalke 04           1899 Hoffenheim                  Robert Lewandowski     29
55    2018–19             Bayern Munich          Borussia Dortmund                RB Leipzig                  Robert Lewandowski     22
56    2019–20             Bayern Munich          Borussia Dortmund                RB Leipzig                  Robert Lewandowski     34
57    2020–21             Bayern Munich                 RB Leipzig         Borussia Dortmund                  Robert Lewandowski     41
58    2021–22             Bayern Munich          Borussia Dortmund          Bayer Leverkusen                  Robert Lewandowski     35

CodePudding user response:

While still using pandas "simplest" way to create your DataFrame is going with pandas.read_html():

import pandas as pd

df = pd.read_html('https://en.wikipedia.org/wiki/List_of_German_football_champions')[7]

To simply rename the columns and get rid of the [7]:

df.columns = ['Season', 'Champions', 'Runners-up', 'Third place',
   'Top scorer(s)', 'Goals']

Output:

Season Champions Runners-up Third place Top scorer(s) Goals
0 1963–64 1. FC Köln (2) Meidericher SV Eintracht Frankfurt Uwe Seeler 30
1 1964–65 Werder Bremen (1) 1. FC Köln Borussia Dortmund Rudi Brunnenmeier 24
2 1965–66 TSV 1860 Munich (1) Borussia Dortmund Bayern Munich Friedhelm Konietzka 26
3 1966–67 Eintracht Braunschweig (1) TSV 1860 Munich Borussia Dortmund Lothar Emmerich, Gerd Müller 28
4 1967–68 1. FC Nürnberg (9) Werder Bremen Borussia Mönchengladbach Hannes Löhr 27

...


An alternativ to avoid all these lists, get cleaner in process and using BeautifulSoup directly is to create more structured data - A single list of dicts:

data = []

for tr in soup.select('table:nth-of-type(8) tr:not(:has(th))'):
    data.append({
        'season':tr.find_all("a")[0].text,
        'first_place': tr.find_all("a")[1].text,
        'runner_up': tr.find_all("a")[2].text,
        'third_place': tr.find_all("a")[3].text,
        'top_scorer': tr.find_all("a")[4].text,
    })

pd.DataFrame(data)
Example
import pandas as pd
from bs4 import BeautifulSoup
import requests

url="https://en.wikipedia.org/wiki/List_of_German_football_champions"
page=requests.get(url).content
soup=BeautifulSoup(page,"html.parser")

data = []

for tr in soup.select('table:nth-of-type(8) tr:not(:has(th))'):
    data.append({
        'season':tr.find_all("a")[0].text,
        'first_place': tr.find_all("a")[1].text,
        'runner_up': tr.find_all("a")[2].text,
        'third_place': tr.find_all("a")[3].text,
        'top_scorer': tr.find_all("a")[4].text,
    })

pd.DataFrame(data)
  • Related