All,
I have a dataframe (df_live) with the following structure:
live live.updated live.latitude live.longitude live.altitude live.direction live.speed_horizontal live.speed_vertical live.is_ground
0 NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN NaN NaN NaN NaN
.. ... ... ... ... ... ... ... ... ...
95 NaN NaN NaN NaN NaN NaN NaN NaN NaN
96 NaN 2022-10-11T17:46:19 00:00 -45.35 169.88 5791.2 44.0 518.560 0.0 False
97 NaN 2022-10-11T17:45:54 00:00 -27.55 143.20 11277.6 139.0 853.772 0.0 False
98 NaN NaN NaN NaN NaN NaN NaN NaN NaN
99 NaN NaN NaN NaN NaN NaN NaN NaN NaN
I would like to iterate through this dataframe such that I only obtain rows for which numerical values are available (e.g. rows 96 and 97).
The code I am using is as follows:
import boto3
import json
from datetime import datetime
import calendar
import random
import time
import requests
import pandas as pd
aircraftdata = ''
params = {
'access_key': 'KEY',
'limit': '100',
'flight_status':'active'
}
url = "http://api.aviationstack.com/v1/flights"
api_result = requests.get('http://api.aviationstack.com/v1/flights', params)
api_statuscode = api_result.status_code
api_response = api_result.json()
df = pd.json_normalize(api_response["data"])
df_live = df[df.loc[:, df.columns.str.contains("live", case=False)].columns]
df_dep = df[df.loc[:, df.columns.str.contains("dep", case=False)].columns]
print(df_live)
for index, row in df_live.iterrows():
if df_live["live_updated"] != "NaN":
print (row)
else:
print ("Not live")
This yields the following error
KeyError: 'live_updated'
CodePudding user response:
instead of iterating with the for loop, how about removing rows with all NaN in one go?
df_live = df_live[df_live.notnull().any(1)]
print(df_live)
CodePudding user response:
Be careful with the column names. The key error
KeyError: 'live_updated'
means that there are no columns in the dataframe with the name of 'live_updated'.
If you check your dataframe columns, the actual name you probably want to refer to is 'live.updated', so just change the column name you are referring to on the code:
for index, row in df_live.iterrows():
if df_live["live.updated"] != "NaN":
print (row)
else:
print ("Not live")
Another solution could be to rename the dataframe columns before you refer to them:
df_live = df_live.rename(columns={'live.updated': 'live_updated'})