Why is "numpy.int32" not able to be printed here? (Using geopandas python 3.9.5)-CodePudding

Here is the relevant code:

import geopandas as gpd

#A shape file (.shp) is imported here, contents do not matter, since the "size()" function gets the size of the contents
shapefile = 'Data/Code_Specific/ne_50m_admin_1_states_provinces/ne_50m_admin_1_states_provinces.shp'

gdf = gpd.read_file(shapefile)[['admin', 'adm0_a3', 'postal', 'geometry']]

#size
#Return an int representing the number of elements in this object.
print(gdf.size())

I am getting an error for the last line of code, TypeError: 'numpy.int32' object is not callable

The main purpose for this is that I am trying to integrade gdf.size() into a for loop:

for index in range(gdf.size()):
    print("test", index)
    #if Austrailia, remove
    if gdf.get('adm0_a3')[index] == "AUS":
        gdf = gdf.drop(gdf.index[index])

I have absolutely no clue what to do here, this is my first post on this site ever. Hope I don't get guilded with a badge of honor for how stupid or simple this is, I'm stumped.

CodePudding user response：

gpd.read_file will return either a GeoDataFrame or a DataFrame object, both of which have the attribute size which returns an integer. The attribute is simply accessed with gdf.size and by adding brackets next to it, you get your error.

size is the wrong attribute to use, as for a table it returns the number of rows times the number of columns. At first glance the following should work

for index in gdf.index:
    ...

but you're modifying the length of an iterable while iterating from it. This can throw everything out of sync and cause a KeyError if you drop an index and before you try to access it. Since all you want to do is filter some rows, simply use

gdf = gdf[gdf['adm0_a3'] != 'AUS']

CodePudding user response：

I think the function you are looking for is,

gdf.shape[0]

len(gdf.index)

I think the first option is more readable but the second one is faster.