Home > front end >  How to separate a column of a pandas dataframe that contains the geographic coordinates in two colum
How to separate a column of a pandas dataframe that contains the geographic coordinates in two colum

Time:03-10

I have the following pandas dataframe:

      import pandas as pd

      df = pd.DataFrame({'NAME': ['Paris', 'New York', 'Rio'],
                         'GEO': ['POINT (48.85 2.31647)',
                                 'POINT (40.731499671618 -73.993457389558)',
                                 'POINT (-22.9 -43.2)']})

      print(df)

       NAME            GEO
       Paris      POINT (48.85 2.31647)
       New York   POINT (40.731499671618 -73.993457389558)
       Rio        POINT (-22.9 -43.2)

I need to separate the GEO column into two columns. One column to store latitude and another column to store longitude.

So, based on this code: Adding Lat Lon coordinates to separate columns (python/dataframe) , I implemented the following:

       df['GEO'].str('POINT ()').str.strip(' ', expand=True).rename(columns={0:'LAT', 1:'LONG'})

However, it is giving the error: "TypeError: 'StringMethods' object is not callable"

I would like the output to be:

      NAME             GEO                                   LAT              LONG
       Paris      POINT (48.85 2.31647)                      48.85            2.31647
       New York   POINT (40.731499671618 -73.993457389558)   40.731499671618  -73.993457389558
       Rio        POINT (-22.9 -43.2)                        -22.9            -43.2

CodePudding user response:

You could use a regex:

df2 = df.join(df['GEO'].str.extract(r'(?P<LAT>-?\d \.\d ) (?P<LONG>-?\d \.\d )'))

output:

       NAME                                       GEO              LAT              LONG
0     Paris                     POINT (48.85 2.31647)            48.85           2.31647
1  New York  POINT (40.731499671618 -73.993457389558)  40.731499671618  -73.993457389558
2       Rio                       POINT (-22.9 -43.2)            -22.9             -43.2

or, to get float:

df2 = df.join(df['GEO'].str.extract(r'(?P<LAT>-?\d \.\d ) (?P<LONG>-?\d \.\d )'))
              .astype(float))

output:

       NAME                                       GEO      LAT       LONG
0     Paris                     POINT (48.85 2.31647)  48.8500   2.316470
1  New York  POINT (40.731499671618 -73.993457389558)  40.7315 -73.993457
2       Rio                       POINT (-22.9 -43.2) -22.9000 -43.200000

CodePudding user response:

You were very close. But the .str function is not callable. You can not invoke it as .str(). Modify your code to this and it works (but won't be quite as sleek as your one-liner)

df[['POINT', 'LAT', 'LONG']]  = df['GEO'].str.split(' ', expand=True).rename(columns=({0:'POINT', 1:'LAT', 2:'LONG'}))

df['LAT'] = df['LAT'].str.replace('(','')
df['LONG'] = df['LONG'].str.replace(')','')

You can then delete df['POINT']

  • Related