Home > database >  GeoPandas DataFrame how to explode data by rows with geometry
GeoPandas DataFrame how to explode data by rows with geometry

Time:11-26

!unzip https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_ua10_500k.zip

I'm using the above dataset to explode rows in geopandas dataframe

# Read shapefile
test = gpd.read_file("cb_2018_us_ua10_500k")
# Split Name10 column to extract city & state 
test[['city', 'state_names']] = test['NAME10'].str.split(',', 1, expand=True)
# Remove trailing & leading spaces
test[['city', 'state_names']] = test[['city', 'state_names']].apply(lambda x: x.str.strip())

test.head()

    UACE10  AFFGEOID10  GEOID10 NAME10  LSAD10  UATYP10 ALAND10 AWATER10    geometry    city    state_names
0   88732   400C100US88732  88732   Tucson, AZ  75  U   915276150   2078695 MULTIPOLYGON (((-110.81345 32.11910, -110.7987...   Tucson  AZ
1   01819   400C100US01819  01819   Alturas, CA 76  C   4933312 16517   MULTIPOLYGON (((-120.54610 41.51264, -120.5459...   Alturas CA
2   22366   400C100US22366  22366   Davenport, IA--IL   75  U   357345121   21444164    MULTIPOLYGON (((-90.36678 41.53636, -90.36462 ...   Davenport   IA--IL
3   93322   400C100US93322  93322   Waynesboro, PA--MD  76  C   45455957    88872   MULTIPOLYGON (((-77.50746 39.71577, -77.50605 ...   Waynesboro  PA--MD
4   02548   400C100US02548  02548   Angola, IN  76  C   23646957    3913803 MULTIPOLYGON (((-85.01157 41.59300, -85.00589 ...   Angola  IN

I'm trying to explode state_names by rows

test.assign(state=test["state_names"].str.split("--")).explode('state')

Error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-47-5532b7b6cbdf> in <module>
----> 1 test.assign(state=city_geo["state_names"].str.split("--")).explode('state')

TypeError: explode() takes 1 positional argument but 2 were given

when I'm trying to do without the geometry it's working

test = test[['UACE10', 'AFFGEOID10', 'GEOID10', 'NAME10', 'LSAD10', 'UATYP10',
       'ALAND10', 'AWATER10', 'city', 'state_names']].head()

test.assign(state=test["state_names"].str.split("--")).explode('state')
UACE10  AFFGEOID10  GEOID10 NAME10  LSAD10  UATYP10 ALAND10 AWATER10    city    state_names state
0   88732   400C100US88732  88732   Tucson, AZ  75  U   915276150   2078695 Tucson  AZ  AZ
1   01819   400C100US01819  01819   Alturas, CA 76  C   4933312 16517   Alturas CA  CA
2   22366   400C100US22366  22366   Davenport, IA--IL   75  U   357345121   21444164    Davenport   IA--IL  IA
2   22366   400C100US22366  22366   Davenport, IA--IL   75  U   357345121   21444164    Davenport   IA--IL  IL
3   93322   400C100US93322  93322   Waynesboro, PA--MD  76  C   45455957    88872   Waynesboro  PA--MD  PA
3   93322   400C100US93322  93322   Waynesboro, PA--MD  76  C   45455957    88872   Waynesboro  PA--MD  MD
4   02548   400C100US02548  02548   Angola, IN  76  C   23646957    3913803 Angola  IN  IN

How to explode geopandas dataframe with Geometry?

CodePudding user response:

In this case, the data can be read in as a data frame and then converted to a geopandas data frame after some processing.

import geopandas as gpd

url = 'https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_ua10_500k.zip'

test = gpd.read_file(url)
df = pd.DataFrame(test)

df[['city', 'state_names']] = df['NAME10'].str.split(',', 1, expand=True)
df = df.assign(state=df["state_names"].str.split("--")).explode('state')

# convert df to gdf
test = gpd.GeoDataFrame(df, geometry='geometry')

test.crs

output

<Geographic 2D CRS: EPSG:4269>
Name: NAD83
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: North America - onshore and offshore: Canada - Alberta; British Columbia; Manitoba; New Brunswick; Newfoundland and Labrador; Northwest Territories; Nova Scotia; Nunavut; Ontario; Prince Edward Island; Quebec; Saskatchewan; Yukon. Puerto Rico. United States (USA) - Alabama; Alaska; Arizona; Arkansas; California; Colorado; Connecticut; Delaware; Florida; Georgia; Hawaii; Idaho; Illinois; Indiana; Iowa; Kansas; Kentucky; Louisiana; Maine; Maryland; Massachusetts; Michigan; Minnesota; Mississippi; Missouri; Montana; Nebraska; Nevada; New Hampshire; New Jersey; New Mexico; New York; North Carolina; North Dakota; Ohio; Oklahoma; Oregon; Pennsylvania; Rhode Island; South Carolina; South Dakota; Tennessee; Texas; Utah; Vermont; Virginia; Washington; West Virginia; Wisconsin; Wyoming. US Virgin Islands. British Virgin Islands.
- bounds: (167.65, 14.92, -47.74, 86.46)
Datum: North American Datum 1983
- Ellipsoid: GRS 1980
- Prime Meridian: Greenwich
  • Related