Home > Back-end >  Iterating over an API response breaks for only one column in a Pandas dataframe
Iterating over an API response breaks for only one column in a Pandas dataframe

Time:02-06

Problem: In my dataframe, when looping through zip codes in a weather API, I am getting the SAME values for column "desc" where every value is "cloudy" (this is incorrect for some zip codes). I think it is taking the value from the very last zip code in the list and applying it to every row in the Desc column.

But if I run only zip code 32303 and comment out all the other zip codes, the value for "Desc" is correct, it is now correctly listed as sunny/clear - - this proves the values when looping are incorrect. Heck, it's Florida! ;)

Checking other weather sources I know "sunny/clear" is the correct value for 32303, not "cloudy". So for some reason, iterating is breaking on the column Desc only. I've tried so many options and am just stuck. Any ideas how to fix this?

import requests
import pandas as pd

api_key = 'a14ac278e4c4fdfd277a5b37e1dbe87a'

#Create a dictionary of zip codes for the team
zip_codes = {
    55446: "You",
    16823: "My Boo",
    94086: "Your Boo",
    32303: "Mr. Manatee",
    95073: "Me"
}

# Create a list of zip codes
zip_list = list(zip_codes.keys())

# Create a list of names
name_list = list(zip_codes.values())

#For team data, create a pandas DataFrame from the dictionary
df1 = pd.DataFrame(list(zip_codes.items()), 
                   columns=['Zip Code', 'Name'])

# Create empty lists to hold the API response data
city_name = []
description = []
weather = []
feels_like = []
wind = []
clouds = []

# Loop through each zip code
for zip_code in zip_list:
  # Make a request to the OpenWeatherMap API
  url = f"http://api.openweathermap.org/data/2.5/weather?zip={zip_code},us&units=imperial&appid={api_key}"
  response = requests.get(url).json()
  # Store the response data in the appropriate empty list
  city_name.append(response['name'])
  description = response['weather'][0]['main']
  weather.append(response['main']['temp'])
  feels_like.append(response['main']['feels_like']) 
  wind.append(response['wind']['speed'])
  clouds.append(response['clouds']['all'])
#  rain.append(response['humidity']['value'])

# For weather data, create df from lists
df2 = pd.DataFrame({
    
    'City': city_name,
    'Desc': description,
    'Temp (F)': weather,
    'Feels like': feels_like,
    'Wind (mph)': wind,
    'Clouds %': clouds,
#    'Rain (1hr)': rain,
})

# Merge df1 & df2, round decimals, and don't display index or zip.
df3=pd.concat([df1,df2],axis=1,join='inner').drop('Zip Code', axis=1)
df3[['Temp (F)', 'Feels like', 'Wind (mph)', 'Clouds %']] = df3[['Temp (F)', 'Feels like', 'Wind (mph)', 'Clouds %']].astype(int)

# Don't truncate df
pd.set_option('display.width', 150)

# Print the combined DataFrames
display(df3.style.hide_index())

Example output, note that "Desc" all have the same value "Clouds" but I know that is not correct as some are different.

Name    City    Desc    Temp (F)    Feels like  Wind (mph)  Clouds %
You Minneapolis Clouds  1   -10 12  100
My Boo  Bellefonte  Clouds  10  -1  15  100
Your Boo    Sunnyvale   Clouds  54  53  6   75
Mr. Manatee Tallahassee Clouds  49  49  3   0
Me  Soquel  Clouds  53  52  5   100

For example, if I comment out all the zip codes except for 32303: "Mr. Manatee", then I get a different value:

Name    City    Desc    Temp (F)    Feels like  Wind (mph)  Clouds %
Mr. Manatee Tallahassee Clear   49  49  3   0

To solve this, I tried another approach, below, which DOES give correct values for each zip code. The problem is that several of the columns are json values, and if I can't fix the code above, then I need to parse them and show only the relevant values. But my preference would be to fix the code above!

import requests
import pandas as pd
import json

zip_codes = {
    95073: "Me",
    55446: "You",
    16823: "My Boo",
    94086: "Your Boo",
    32303: "Mr. Manatee"
}

import pandas as pd
import requests

# Create a list of zip codes
zip_list = list(zip_codes.keys())

# Create a list of names
name_list = list(zip_codes.values())

# Create a list of weather data
weather_list = []

# Set the API key
api_key = 'a14ac278e4c4fdfd277a5b37e1dbe87a' 

# Get the weather data from the openweather API
for zip_code in zip_list:
    api_url = f'http://api.openweathermap.org/data/2.5/weather?zip={zip_code},us&units=imperial&appid={api_key}'
    response = requests.get(api_url).json()
    weather_list.append(response)

# Create the dataframe
df = pd.DataFrame(weather_list)

# Add the name column
df['Name'] = name_list

# Parse the 'weather' column
#THIS DOESN'T WORK! df.weather.apply(lambda x: x[x]['main'])

# Drop unwanted columns
df.drop(['coord', 'base', 'visibility','dt', 'sys', 'timezone','cod'], axis=1)

I tried a different approach but got unusable json values. I tried various ways to fix looping in my first approach but I still get the same values for "Desc" instead of unique values corresponding to each zip code.

CodePudding user response:

Like jqurious said, you had a bug in your code:

description = response['weather'][0]['main']

This means description stores the description of the final zip code in the dictionary and will repeat that across the whole dataframe. No wonder they are all the same.

Since you are collecting data to build a dataframe, it's better to use a list of dictionaries rather than a series of lists:

data = []
for zip_code in zip_list:
    url = f"http://api.openweathermap.org/data/2.5/weather?zip={zip_code},us&units=imperial&appid={api_key}"
    response = requests.get(url).json()
    data.append({
        "City": response["name"],
        "Desc": response["weather"][0]["main"],
        "Temp (F)": response["main"]["temp"],
        "Feels like": response["main"]["feels_like"],
        "Wind (mph)": response["wind"]["speed"],
        "Clouds %": response["clouds"]["all"]
    })

# You don't need to redefine the column names here
df2 = pd.DataFrame(data)
  • Related