I have URLs (for web scraping) and municipality name stored in this list:
muni = [("https://openbilanci.it/armonizzati/bilanci/filettino-comune-fr/entrate/dettaglio?year=2021&type=preventivo", "filettino"), ("https://openbilanci.it/armonizzati/bilanci/partanna-comune-tp/entrate/dettaglio?year=2021&type=preventivo","partanna"), ("https://openbilanci.it/armonizzati/bilanci/fragneto-labate-comune-bn/entrate/dettaglio?year=2021&type=preventivo", "fragneto-labate") ]
I am trying to create different datasets for different municipalities. For example, data scraped from the first URL would be: filettinodak.csv
.
I am using the following code right now:
import re
import json
import requests
import pandas as pd
import os
import random
os.chdir(r"/Users/aartimalik/Dropbox/data")
muni = [("https://openbilanci.it/armonizzati/bilanci/filettino-comune-fr/entrate/dettaglio?year=2021&type=preventivo", "filettino"),
("https://openbilanci.it/armonizzati/bilanci/partanna-comune-tp/entrate/dettaglio?year=2021&type=preventivo","partanna"),
("https://openbilanci.it/armonizzati/bilanci/fragneto-labate-comune-bn/entrate/dettaglio?year=2021&type=preventivo", "fragneto-labate")
]
for m in muni[1]:
URL = m
r = requests.get(URL)
p = re.compile("var bilancio_tree = (.*?);")
data = p.search(r.text).group(1)
data = json.loads(data)
all_data = []
for d in data:
for v in d["values"]:
for kk, vv in v.items():
all_data.append([d["label"], "-", kk, vv.get("abs"), vv.get("pc")])
for c in d["children"]:
for v in c["values"]:
for kk, vv in v.items():
all_data.append(
[d["label"], c["label"], kk, vv.get("abs"), vv.get("pc")]
)
df = pd.DataFrame(all_data, columns=["label 1", "label 2", "year", "abs", "pc"])
df.to_csv(muni[2] "dak.csv", index=False)
The error I am getting is: Traceback (most recent call last): File "<stdin>", line 19, in <module> TypeError: can only concatenate tuple (not "str") to tuple
.
I think I am doing something wrong with the muni indexing: muni[i]
. Any suggestions? Thank you so much!
CodePudding user response:
If you adjust your for loop a bit, it should solve your problem. The below change loops through all list entries in muni
. Each time, it extracts the first value from each tuple into URL
and the second tuple value into label
.
for URL, label in muni:
And with that change, the final line in your code can become:
df.to_csv(label "dak.csv", index=False)