Foor loop prints all elements, but when result is saved in pandas dataframe it returns NaN-CodePudding

I have the following scraping script, I need to get elements inside a "items2" foor loop. The script is printing all elements, but later on dataframe returns "name" and "tPlan" as NaN. Any idea why?

import requests  
import json
import csv
import sys

from bs4 import BeautifulSoup


base_url = "xxxx"  
username = "xxxx"  
password = "xxxx"
toget = data

allowed_results = 50  
max_results = "maxResults="   str(allowed_results)
tc = "/testcycles?"

result_count = -1  
start_index = 0  

df = pd.DataFrame(
  columns=['id', 'name', 'gId', 'dKey', 'tPlan'])

for eachId in toget['TPlan_ID']:
   while result_count != 0:  
      start_at = "startAt="   str(start_index)
      url = url = f'{base_url}{eachId}{tc}&{start_at}&{max_results}'  
      response = requests.get(url, auth=(username, password))  
      json_response = json.loads(response.text)
      print(json_response)
      page_info = json_response["meta"]["pageInfo"]
      start_index = page_info["startIndex"]   allowed_results  
      result_count = page_info["resultCount"]
      items2 = json_response["data"]
      print(items2)

      for item in items2:
          print (item["id"])            
          print (item["fields"]["name"])
          print (item["fields"]["gId"])
          print (item["fields"]["dKey"])
          print (item["fields"]["tPlan"])
        
          temporary_df = pd.DataFrame([item], columns=['id', 'name', 'gId', 'dKey', 'tPlan'])
          df = df.append(temporary_df, ignore_index=True)

CodePudding user response：

TLDR

Use this for loop.

for item in items2:
    df = df.append({'id': item['id'], **item['fields']}, ignore_index=True)

Explanation

I am making this assumption that the items2 would look something like this.

items2 = [
    { 'id': 0, 'fields': {'name': 'prop1', 'gId': 100, 'dKey': 'key1', 'tPlan': 'plan1'}},
    { 'id': 1, 'fields': {'name': 'prop2', 'gId': 200, 'dKey': 'key2', 'tPlan': 'plan2'}},
    { 'id': 2, 'fields': {'name': 'prop3', 'gId': 300, 'dKey': 'key3', 'tPlan': 'plan3'}},
]

You can't create your intended data frame since the structure of item is like this.

{'id': 2, 'fields': {'name': 'prop3', 'gId': 300, 'dKey': 'key3', 'tPlan': 'plan3'}}

which results in temporary_df filled with NaN.

   id name  gId dKey tPlan fields
0   0  NaN  NaN  NaN   NaN   key1
1   0  NaN  NaN  NaN   NaN    100
2   0  NaN  NaN  NaN   NaN  prop1
3   0  NaN  NaN  NaN   NaN  plan1
4   1  NaN  NaN  NaN   NaN   key2
5   1  NaN  NaN  NaN   NaN    200
6   1  NaN  NaN  NaN   NaN  prop2
7   1  NaN  NaN  NaN   NaN  plan2
8   2  NaN  NaN  NaN   NaN   key3
9   2  NaN  NaN  NaN   NaN    300
10  2  NaN  NaN  NaN   NaN  prop3
11  2  NaN  NaN  NaN   NaN  plan3

What you would need to pass as argument to pd.DataFrame is a dict structure like

{'id': 2, 'name': 'prop3', 'gId': 300, 'dKey': 'key3', 'tPlan': 'plan3'}

Notic the missing fields dict here, all the key value pair from fields are added to item. Using this altered dict would result in temporary_df like

  id   name  gId  dKey  tPlan
0  0  prop1  100  key1  plan1
1  1  prop2  200  key2  plan2
2  2  prop3  300  key3  plan3

To make this change in item structure you should do this

new_item = {'id': item['id']}
for key, value in item['fields'].items():
    new_item[key] = value

But you can write this concisely by using the unpacking operator **

new_item = {'id': item['id'], **item['fields']}

Now we can use pass new_item as argument to pd.DataFrame.

temp_df = pd.DataFrame({ 'id': item['id'], **item['fields']}, index=(i,)) # i here is the row index of the DataFrame

After making these changes your for loop should look something like this

for i, item in enumerate(items2):
    new_item = {'id': item['id'], **item['fields']}
    temp_df = pd.DataFrame(new_item, index=(i,))
    df = df.append(temp_df, ignore_index=True)

We can make this a bit more concise by directly passing the new_item to pd.DataFrame.append

Thus in the end this code should work.

for item in items2:
    new_item = {'id': item['id'], **item['fields']}
    df = df.append(new_item, ignore_index=True)