I'm practicing python pandas and trying to extract the first 500 coins from coingecko into one dataframe. Each page in coingecko has 100 coins. I am able to extract the dataframes from each page by changing the page# in the url..I tried using df1.append(df2) by using a for loop over the page numbers but that didn't work. What am i doing wrong?
I need one dataframe with all 500 coins
def get_coingecko_data():
page_numbers = [1,2,3,4,5]
for n in page_numbers:
r = requests.get(f"https://www.coingecko.com/?page={n}")
df = pd.read_html(r.text)[0]
df2 = pd.DataFrame(df)
df2 = df2[["#", "Coin", "Price", "Mkt Cap"]]
if n == 1:
df_500 = df2.copy()
else:
df_500.append(df4, ignore_index = True)
CodePudding user response:
You're missing the equal sign to update df_500
variable.
df_500 = df_500.append(df2, ignore_index=True)
By the way here is a cleaner version:
def get_coingecko_data():
page_numbers = [1, 2, 3, 4, 5]
df500 = pd.DataFrame()
for n in page_numbers:
r = requests.get(f"https://www.coingecko.com/?page={n}")
dfPre = pd.read_html(r.text)[0]
df500 = df500.append(dfPre[["#", "Coin", "Price", "Mkt Cap"]], ignore_index=True)
CodePudding user response:
you have a couple of issues. First the last line has a typo, it should be df2
not df4
.
The second problem you have is that contrary to the behaviour of the python append, the pandas.DataFrame.append
does not change the object but rather returns the result of the append as such you would have to rewrite the last line as
df_500 = df_500.append(df2, ignore_index = True)
It is for this quite common confusion among others that append
has been deprecated since v1.4 and it is now recommended to switch to pandas.concat
Last but not least your function lacks a return.
Fixing all of the above reults in the function
def get_coingecko_data():
page_numbers = [1,2,3,4,5]
for n in page_numbers:
r = requests.get(f"https://www.coingecko.com/?page={n}")
df = pd.read_html(r.text)[0]
df2 = pd.DataFrame(df)
df2 = df2[["#", "Coin", "Price", "Mkt Cap"]]
if n == 1:
df_500 = df2.copy()
else:
df_500 = pd.concat([df_500, df2], ignore_index = True)
return df_500