Home > Software design >  how to create a dataframe loop by adding data to additional data
how to create a dataframe loop by adding data to additional data

Time:07-27

i wanted to extract data from a website using its web API response URL which contains only JSON data but the data consists of thousands of records and the website had limited to show data in the table to only 500 max and it consists of n number of pages I used this code it gives the data from the URL but I am unable to create a single data frame for complete data storage

import pandas as pd
page_no =1
start_value =0
response = 1658837156089
for i in range(2567):
    urltext = 'urllink'.format(page_no=page_no, start_value=start_value, response=response)
    df1=pd.read_json(urltext)
    df2=df1.append(df1, ignore_index=False)
    df1.drop(df1.index,axis=0, inplace=True)
    print(df2)
    page_no =1
    start_value =500
    response =1

if you got any idea how I should change please do suggest

CodePudding user response:

It is generally slower to append new data to a dataframe in a loop. It is more efficient to extend / append the data to a list and then convert the entire list into a dataframe.

If you absolutely must have the data in pd.DataFrame type, you can have a list of pd.DataFrame and use pd.concat to create one dataframe from a list of dataframes [Refer below]. https://pandas.pydata.org/docs/reference/api/pandas.concat.html

CodePudding user response:

You might want to create an empty list at the begining of your code, then append each partial dataframe to it. At the end you concat everything in one dataframe. Would look like

import pandas as pd
page_no =1
partial_dfs=[]
start_value =0
response = 1658837156089
for i in range(2567):
    urltext = 'urllink'.format(page_no=page_no, start_value=start_value, response=response)
    df1=pd.read_json(urltext)
    df2=df1.append(df1, ignore_index=False)
    df1.drop(df1.index,axis=0, inplace=True)
    print(df2)
    partial_dfs.append(df2)
    page_no =1
    start_value =500
    response =1
whole_df=pd.concat(partial_dfs)
  • Related