Home > database >  Appending Dataframe to another dataframe with first row removed
Appending Dataframe to another dataframe with first row removed

Time:10-11

# Import libs
import pandas as pd
import requests
from bs4 import BeautifulSoup
import json

# Form Data for passing to the request body
formdata = {'objid': '14'}

# URL
url = "https://www.sec.kerala.gov.in/public/getalllbcmp/byd"

# Query
for i in range(1, 15):
    formdata["objid"] = str(i)
    response = requests.request("POST", url, data=formdata, timeout=1500)
    out = response.content
    soup = BeautifulSoup(out,"html.parser")
    bat = json.loads(soup.text)
    df = pd.DataFrame(bat["ops1"])
    df.to_csv(str(i)   ".csv")

Right now this query creates 14 csv files. What I wanted is, the for loop to remove the first row of column headers and append the data to a dataframe I created outside the for loop. so that I can get it as single csv file.

I am using BS and Pandas.

CodePudding user response:

I would use a function to get the data and return a DataFrame, then use it within concat:

def get_data(i):
    formdata["objid"] = str(i)
    response = requests.request("POST", url, data=formdata, timeout=1500)
    out = response.content
    soup = BeautifulSoup(out,"html.parser")
    bat = json.loads(soup.text)
    return pd.DataFrame(bat["ops1"])

df = pd.concat([get_data(i) for i in range(1, 15)])
df.to_csv('all_data.csv')

NB. if this gives you unsatisfactory results, please provide a short extract of 2/3 dataframes and the expected merged output.

CodePudding user response:

This is one way of achieving your goal:

# Import libs
import pandas as pd
import requests
from tqdm.notebook import tqdm

final_df = pd.DataFrame()
# URL
url = "https://www.sec.kerala.gov.in/public/getalllbcmp/byd"

# Query
for i in tqdm(range(1, 15)):
    formdata = {'objid': i}
    r = requests.post(url, data=formdata)
    df = pd.json_normalize(r.json()["ops1"])
    final_df = pd.concat([final_df, df], axis=0, ignore_index=True)
final_df.to_csv('some_data_saved.csv')
print(final_df)

Data will be saved to a csv file, and also printed in terminal:

100%
14/14 [00:14<00:00, 1.05s/it]
value   text
0   8o7LEdvX2e  G14001-Kumbadaje
1   jw2XOQyZ4K  G14002-Bellur
2   0lMB1O4LbV  G14003-Karadka
3   zodLro2Z39  G14004-Muliyar
4   dWxLYn8ZME  G14005-Delampady
... ... ...
1029    Qy6Z09bBKE  G01073-Ottoor
1030    ywoXG8wLxV  M01001-Neyyattinkara
1031    Kk8Xvz7XO9  M01002-Nedumangad
1032    r7eXQYgX8m  M01003-Attingal
1033    b3KXlO2B8g  M01004-Varkala
1034 rows × 2 columns

Requests can return responses in JSON format, so you don;t need to import bs4 & json.

For TQDM, please see https://pypi.org/project/tqdm/

For pandas documentation, visit https://pandas.pydata.org/docs/

Also for Requests: https://requests.readthedocs.io/en/latest/

  • Related