I have this code (borrowed from an old question posted ont his site)
import pandas as pd
import json
import numpy as np
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://www.baseball-reference.com/leagues/MLB/2013-finalyear.shtml")
from bs4 import BeautifulSoup
doc = BeautifulSoup(driver.page_source, "html.parser")
#(The table has an id, it makes it more simple to target )
batting = doc.find(id='misc_batting')
careers = []
for row in batting.find_all('tr')[1:]:
dictionary = {}
dictionary['names'] = row.find(attrs = {"data-stat": "player"}).text.strip()
dictionary['experience'] = row.find(attrs={"data-stat": "experience"}).text.strip()
careers.append(dictionary)
Which generates a result like this:
[{'names': 'David Adams', 'experience': '1'}, {'names': 'Steve Ames', 'experience': '1'}, {'names': 'Rick Ankiel', 'experience': '11'}, {'names': 'Jairo Asencio', 'experience': '4'}, {'names': 'Luis Ayala', 'experience': '9'}, {'names': 'Brandon Bantz', 'experience': '1'}, {'names': 'Scott Barnes', 'experience': '2'}, {'names':
How do I create this into a column separated dataframe like this?
Names Experience
David Adams 1
CodePudding user response:
Simply pass your list of dicts (careers
) to pandas.DataFrame()
to get your expected result.
Example
import pandas as pd
careers = [{'names': 'David Adams', 'experience': '1'}, {'names': 'Steve Ames', 'experience': '1'}, {'names': 'Rick Ankiel', 'experience': '11'}, {'names': 'Jairo Asencio', 'experience': '4'}, {'names': 'Luis Ayala', 'experience': '9'}, {'names': 'Brandon Bantz', 'experience': '1'}, {'names': 'Scott Barnes', 'experience': '2'}]
pd.DataFrame(careers)
Output
names | experience |
---|---|
David Adams | 1 |
Steve Ames | 1 |
Rick Ankiel | 11 |
Jairo Asencio | 4 |
Luis Ayala | 9 |
Brandon Bantz | 1 |
Scott Barnes | 2 |
CodePudding user response:
You can simplify this quite a bit with pandas
. Have it pull the table, then you just want the Names
and Yrs
columns.
import pandas as pd
url = "https://www.baseball-reference.com/leagues/MLB/2013-finalyear.shtml"
df = pd.read_html(url, attrs = {'id': 'misc_batting'})[0]
df_filter = df[['Name','Yrs']]
If you need to rename those columns, add:
df_filter = df_filter.rename(columns={'Name':'names','Yrs':'experience'})
Output:
print(df_filter)
names experience
0 David Adams 1
1 Steve Ames 1
2 Rick Ankiel 11
3 Jairo Asencio 4
4 Luis Ayala 9
.. ... ...
209 Dewayne Wise 11
210 Ross Wolf 3
211 Kevin Youkilis 10
212 Michael Young 14
213 Totals 1357
[214 rows x 2 columns]