Home > Back-end >  Iterating through pandas dataframe with if-statement
Iterating through pandas dataframe with if-statement

Time:10-12

All,

I have a dataframe (df_live) with the following structure:

    live               live.updated  live.latitude  live.longitude  live.altitude  live.direction  live.speed_horizontal  live.speed_vertical live.is_ground
0    NaN                        NaN            NaN             NaN            NaN             NaN                    NaN                  NaN            NaN
1    NaN                        NaN            NaN             NaN            NaN             NaN                    NaN                  NaN            NaN
2    NaN                        NaN            NaN             NaN            NaN             NaN                    NaN                  NaN            NaN
3    NaN                        NaN            NaN             NaN            NaN             NaN                    NaN                  NaN            NaN
4    NaN                        NaN            NaN             NaN            NaN             NaN                    NaN                  NaN            NaN
..   ...                        ...            ...             ...            ...             ...                    ...                  ...            ...
95   NaN                        NaN            NaN             NaN            NaN             NaN                    NaN                  NaN            NaN
96   NaN  2022-10-11T17:46:19 00:00         -45.35          169.88         5791.2            44.0                518.560                  0.0          False
97   NaN  2022-10-11T17:45:54 00:00         -27.55          143.20        11277.6           139.0                853.772                  0.0          False
98   NaN                        NaN            NaN             NaN            NaN             NaN                    NaN                  NaN            NaN
99   NaN                        NaN            NaN             NaN            NaN             NaN                    NaN                  NaN            NaN

I would like to iterate through this dataframe such that I only obtain rows for which numerical values are available (e.g. rows 96 and 97).

The code I am using is as follows:

import boto3
import json
from datetime import datetime
import calendar
import random
import time
import requests
import pandas as pd

aircraftdata = ''
params = {
'access_key': 'KEY',
'limit': '100',
'flight_status':'active'
  }


url = "http://api.aviationstack.com/v1/flights"


api_result = requests.get('http://api.aviationstack.com/v1/flights', params)
api_statuscode =  api_result.status_code
api_response = api_result.json()


df = pd.json_normalize(api_response["data"])
df_live = df[df.loc[:, df.columns.str.contains("live", case=False)].columns]
df_dep = df[df.loc[:, df.columns.str.contains("dep", case=False)].columns]
print(df_live)

for index, row in df_live.iterrows():
    if df_live["live_updated"] != "NaN":
        print (row)
    else:
        print ("Not live")

This yields the following error

KeyError: 'live_updated'

CodePudding user response:

instead of iterating with the for loop, how about removing rows with all NaN in one go?

df_live = df_live[df_live.notnull().any(1)]
print(df_live)

CodePudding user response:

Be careful with the column names. The key error KeyError: 'live_updated' means that there are no columns in the dataframe with the name of 'live_updated'.

If you check your dataframe columns, the actual name you probably want to refer to is 'live.updated', so just change the column name you are referring to on the code:

for index, row in df_live.iterrows():
    if df_live["live.updated"] != "NaN":
        print (row)
    else:
        print ("Not live")

Another solution could be to rename the dataframe columns before you refer to them:

df_live = df_live.rename(columns={'live.updated': 'live_updated'})
  • Related