Home > front end >  Python: confused about results returned by panda .read_json using chunks
Python: confused about results returned by panda .read_json using chunks

Time:05-20

I am reading a large json file using the python Panda library and splitting it into manageable chunks. Here is my code:

import pandas as pd
inputFile='file.json'
chunks = pd.read_json(inputFile, lines=True, chunksize = 5)

i = 1


for c in chunks:
    location = c.a.str.split(',')
    print(location)
    i  = 1

    if i > 1:
        break

and this is the output:

0                                    [poland]
1        [reading,  reading,  united kingdom]
2            [humble,  texas,  united states]
3    [adelaide,  south australia,  australia]
4                                     [italy]
Name: loc, dtype: object

I am interested in returning the State (the last element of each array) but if I modify my code in the following way:

import pandas as pd
inputFile='PeopleDataLabs_416M.json/PeopleDataLabs_416M.json'
chunks = pd.read_json(inputFile, lines=True, chunksize = 5)

i = 1

for c in chunks:
    location = c.a.str.split(',')
    print(location.pop())
    i  = 1

    if i > 1:
        break

I get the error:

print(location.pop())
TypeError: pop() missing 1 required positional argument: 'item'

Also the line:

print(location[-1])

return an error:

raise KeyError(key) from err
KeyError: -1

that is telling me that the variable 'location' is not an array, In fact the lines:

location = c.a.str.split(',')
print(type(location))

return:

<class 'pandas.core.series.Series'>

so my question is how do I extract the values: poland, united kingdom, united states, australia, italy from my output?

Thank you for your help

CodePudding user response:

pandas.core.series.Series have 2 methods which can help you,

  1. .iat(index)
  2. .iloc(index)

So for example, you can get the last element of location as below,

lastElement = location.iat[-1]

  • Related