I am reading a large json file using the python Panda library and splitting it into manageable chunks. Here is my code:
import pandas as pd
inputFile='file.json'
chunks = pd.read_json(inputFile, lines=True, chunksize = 5)
i = 1
for c in chunks:
location = c.a.str.split(',')
print(location)
i = 1
if i > 1:
break
and this is the output:
0 [poland]
1 [reading, reading, united kingdom]
2 [humble, texas, united states]
3 [adelaide, south australia, australia]
4 [italy]
Name: loc, dtype: object
I am interested in returning the State (the last element of each array) but if I modify my code in the following way:
import pandas as pd
inputFile='PeopleDataLabs_416M.json/PeopleDataLabs_416M.json'
chunks = pd.read_json(inputFile, lines=True, chunksize = 5)
i = 1
for c in chunks:
location = c.a.str.split(',')
print(location.pop())
i = 1
if i > 1:
break
I get the error:
print(location.pop())
TypeError: pop() missing 1 required positional argument: 'item'
Also the line:
print(location[-1])
return an error:
raise KeyError(key) from err
KeyError: -1
that is telling me that the variable 'location' is not an array, In fact the lines:
location = c.a.str.split(',')
print(type(location))
return:
<class 'pandas.core.series.Series'>
so my question is how do I extract the values: poland, united kingdom, united states, australia, italy from my output?
Thank you for your help
CodePudding user response:
pandas.core.series.Series
have 2 methods which can help you,
.iat(index)
.iloc(index)
So for example, you can get the last element of location
as below,
lastElement = location.iat[-1]