Home > Blockchain >  Pandas not filtering unless dataframe is saved into a csv and read back as a csv, source is a json l
Pandas not filtering unless dataframe is saved into a csv and read back as a csv, source is a json l

Time:12-22

I have a json output which looks like this.

{'pagination': {'limit': 100, 'offset': 0, 'count': 38, 'total': 38},
 'data': [{'name': 'Ceco Environmental Corp',
   'symbol': 'CECE',
   'has_intraday': False,
   'has_eod': True,
   'country': None,
   'stock_exchange': {'name': 'NASDAQ Stock Exchange',
    'acronym': 'NASDAQ',
    'mic': 'XNAS',
    'country': 'USA',
    'country_code': 'US',
    'city': 'New York',
    'website': 'www.nasdaq.com'}},
  {'name': 'CEC CoreCast Corporation Ltd',
   'symbol': '600764.XSHG',
   'has_intraday': False,
   'has_eod': True,
   'country': None,
   'stock_exchange': {'name': 'Shanghai Stock Exchange',
    'acronym': 'SSE',
    'mic': 'XSHG',
    'country': 'China',
    'country_code': 'CN',
    'city': 'Shanghai',
    'website': 'www.sse.com.cn'}},
  {'name': 'CECEP WindPower Corp',
   'symbol': '601016.XSHG',
   'has_intraday': False,
   'has_eod': True,
   'country': None,
   'stock_exchange': {'name': 'Shanghai Stock Exchange',
    'acronym': 'SSE',
    'mic': 'XSHG',
    'country': 'China',
    'country_code': 'CN',
    'city': 'Shanghai',
    'website': 'www.sse.com.cn'}},
  {'name': 'CECONOMY AG INHABER-STAMMAKTIEN O.N.',
   'symbol': 'CEC.XSTU',
   'has_intraday': False,
   'has_eod': True,
   'country': None,
   'stock_exchange': {'name': 'Börse Stuttgart',
    'acronym': 'XSTU',
    'mic': 'XSTU',
    'country': 'Germany',
    'country_code': 'DE',
    'city': 'Stuttgart',
    'website': 'www.boerse-stuttgart.de'}},
  {'name': 'CECONOMY AG ST O.N.',
   'symbol': 'CEC.XFRA',
   'has_intraday': False,
   'has_eod': True,
   'country': None,
   'stock_exchange': {'name': 'Deutsche Börse',
    'acronym': 'FSX',
    'mic': 'XFRA',
    'country': 'Germany',
    'country_code': 'DE',
    'city': 'Frankfurt',
    'website': 'www.deutsche-boerse.com'}},
  {'name': 'CECONOMY AG ST O.N.',
   'symbol': 'CEC.XETRA',
   'has_intraday': False,
   'has_eod': True,
   'country': None,
   'stock_exchange': {'name': 'Deutsche Börse Xetra',
    'acronym': 'XETR',
    'mic': 'XETRA',
    'country': 'Germany',
    'country_code': 'DE',
    'city': 'Frankfurt',
    'website': ''}},
  {'name': 'CECEP COSTIN',
   'symbol': '2228.XHKG',
   'has_intraday': False,
   'has_eod': True,
   'country': None,
   'stock_exchange': {'name': 'Hong Kong Stock Exchange',
    'acronym': 'HKEX',
    'mic': 'XHKG',
    'country': 'Hong Kong',
    'country_code': 'HK',
    'city': 'Hong Kong',
    'website': 'www.hkex.com.hk'}},
.....

I am trying to load it into a dataframe and filter the stock_exchange column by country.

Here is my code.

import pandas as pd
data = api_result.json()
result = pd.DataFrame(data['data'])

result[result['stock_exchange'].str.contains('China')]

But I get the following error, KeyError: "None of [Float64Index([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],\n dtype='float64')] are in the [columns]"

However, if I save the output to a csv, and reload it back into a dataframe like this,

result.to_csv('result.csv')
result = pd.read_csv('result.csv')

result[result['stock_exchange'].str.contains('China')]

I get the filtered dataframe like this,


Unnamed: 0  name    symbol  has_intraday    has_eod country stock_exchange
1   1   CEC CoreCast Corporation Ltd    600764.XSHG False   True    NaN {'name': 'Shanghai Stock Exchange', 'acronym':...
2   2   CECEP WindPower Corp    601016.XSHG False   True    NaN {'name': 'Shanghai Stock Exchange', 'acronym':...

Any idea why I can't filter the dataframe without saving the frame to csv and reloading first?

CodePudding user response:

You must create your dataframe again. Use from this code

df = pd.DataFrame.from_records(data.get('data'))

CodePudding user response:

Just use json_normalize() I should note that each cell of stock_exchange is a dict object not a str object, which is part of the problem because you will always get NaN and not a bool so you cannot use boolean indexing like you currently are doing - i.e., result[result['stock_exchange'].str.contains('China')]

df = pd.json_normalize(data['data'])
df[df['stock_exchange.country'] == 'China']

                           name       symbol  has_intraday  has_eod country  \
1  CEC CoreCast Corporation Ltd  600764.XSHG         False     True    None   
2          CECEP WindPower Corp  601016.XSHG         False     True    None   

       stock_exchange.name stock_exchange.acronym stock_exchange.mic  \
1  Shanghai Stock Exchange                    SSE               XSHG   
2  Shanghai Stock Exchange                    SSE               XSHG   

  stock_exchange.country stock_exchange.country_code stock_exchange.city  \
1                  China                          CN            Shanghai   
2                  China                          CN            Shanghai   

  stock_exchange.website  
1         www.sse.com.cn  
2         www.sse.com.cn  
  • Related