I have a little parser that is gathering RSS feed channel to pandas df. Everything works as expected but I get this waring
The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead
After some research, I converted my dicts to list and then started to concatenate but now I get the
type '<class 'list'>'; only Series and DataFrame objs are valid
how to rewrite my for loop to get expected result
working code with warning
df = pd.DataFrame(columns = ['title', 'link'])
with response as r:
items = r.html.find('item', first=False)
for item in items:
title = item.find('title', first=True).text
link = item.find('guid', first=True).text
row = {'title': title, 'link': link}
df = df.append(row, ignore_index=True)
slightly modified, gives error
df = pd.DataFrame(columns = ['title', 'link'])
tmp = []
with response as r:
items = r.html.find('item', first=False)
for item in items:
title = item.find('title', first=True).text
link = item.find('guid', first=True).text
row = [title, link]
tmp.append(row)
df = pd.concat(tmp)
CodePudding user response:
You can use pd.concat() for dataframes. You just need the create your dataframe with the tmp list. Maybe you can get data with pd.read_html I don't know actually.
tmp = []
with response as r:
items = r.html.find('item', first=False)
for item in items:
title = item.find('title', first=True).text
link = item.find('guid', first=True).text
row = [title, link]
tmp.append(row)
df = pd.DataFrame(tmp, columns=['title', 'link'])
CodePudding user response:
You need to change row
to dict, e.g.:
row = {'col1': [title], 'col2': [link]}
and the append line to:
tmp = tmp.append(pd.DataFrame(row))
don't forget to reset the tmp to dataframe:
tmp = pd.DataFrame()
CodePudding user response:
pd.concat
works to concatenate two or more pandas objects.
If you have succesfully constructed a list of dicts containing your data (which you have in the tmp
variable) then you can transform it into a dataframe just by using the default pd.DataFrame
constructor:
df = pd.DataFrame(columns = ['title', 'link'])
tmp = []
with response as r:
items = r.html.find('item', first=False)
for item in items:
title = item.find('title', first=True).text
link = item.find('guid', first=True).text
row = {'title': title, 'link': link}
tmp.append(row)
df = pd.DataFrame(tmp)