I would like to return each data frame from each URL appended into one single data frame. When I print it within the function, I get the result I desire. The problem is when I try assign a variable to the data frame, it only adds the final data frame. Running this function prints my desired result:
import pandas as pd
urllist = ['https://basketball.realgm.com/nba/boxscore/2022-04-09/Indiana-at-Philadelphia/388705', 'https://basketball.realgm.com/nba/boxscore/2022-04-09/New-Orleans-at-Memphis/388704', 'https://basketball.realgm.com/nba/boxscore/2022-04-09/Golden-State-at-San-Antonio/388706', 'https://basketball.realgm.com/nba/boxscore/2022-04-09/Sacramento-at-LA-Clippers/388703']
def Boxscore(URL):
for x in URL:
box_list = pd.read_html(x)
box1 = box_list[3]
box2 = box_list[4]
fullbox = pd.concat([box1, box2])
print(fullbox)
Boxscore(urllist)
But when I try to assign it a value, it only prints the final data frame, instead of all of them together.
import pandas as pd
urllist = ['https://basketball.realgm.com/nba/boxscore/2022-04-09/Indiana-at-Philadelphia/388705', 'https://basketball.realgm.com/nba/boxscore/2022-04-09/New-Orleans-at-Memphis/388704', 'https://basketball.realgm.com/nba/boxscore/2022-04-09/Golden-State-at-San-Antonio/388706', 'https://basketball.realgm.com/nba/boxscore/2022-04-09/Sacramento-at-LA-Clippers/388703']
def Boxscore(URL):
for x in URL:
box_list = pd.read_html(x)
box1 = box_list[3]
box2 = box_list[4]
fullbox = pd.concat([box1, box2])
return fullbox
fullboxscore = Boxscore(urllist)
print(fullboxscore)
How can I append each data frame into one, and name that new data frame as a variable? Please help, thanks!
CodePudding user response:
Try creating an empty list to append to and then concat
def Boxscore(URL: list) -> pd.DataFrame:
dfs = [] # empty list
for x in URL:
box_list = pd.read_html(x)
box1 = box_list[3]
box2 = box_list[4]
fullbox = pd.concat([box1, box2])
dfs.append(fullbox) # append frame to list
return pd.concat(dfs).reset_index(drop=True) # concat frames and return
# call you function and assign it to a variable
fullboxscore = Boxscore(urllist)