Home > Blockchain >  How can I read a CSV, add two more columns, and insert data that is dependent on the current columns
How can I read a CSV, add two more columns, and insert data that is dependent on the current columns

Time:11-18

I have a csv that looks like this but is much longer:

ID Address
32 54298 Boca Chica Blvd Brownsville, TX
31 6640 Washington St, Yountville, CA 94599

I am attempting to loop through the csv, use geopy to get latitudes and longitudes for each address, and then insert them into a 3rd and 4th column that would look like this:

ID Address Latitude Longitude
32 54298 Boca Chica Blvd Brownsville, TX 34.5165131 90.1656516
31 6640 Washington St, Yountville, CA 94599 26.1231 51.516125

It doesn't really matter if it is in the same old csv (that's what I was trying), or into a new csv. The previous posts I have read either focus on appending whole rows instead of columns, or they just populate the new columns with data that is hand typed/hard coded into the python script.

So far I am able to loop through and get the latitudes and longitudes. The problem I am having is primarily on the writing part. Here is what I have.

import geopy
import pandas as pd
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="sample app")
df = pd.read_csv(r"C:\Users\Desktop\LatLongTest.csv")
for row in df.itertuples():
    try:
        data = geolocator.geocode(row[7])
        print(data.point.latitude)
    except AttributeError:
        print('error')

Any help would be greatly appreciated! I only have a bit of Java experience from back in the day. The documentation and published examples for Python are not as intuitive to me as Java's was, so getting started is a bit of a challenge.

CodePudding user response:

All you need to to add columns and then save the df dataframe. See also Adding new column to existing DataFrame in Python pandas

You can try something like this:

import geopy
import pandas as pd
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="sample app")

df = pd.read_csv(r"C:\Users\Desktop\LatLongTest.csv")

# add new columns
df['Longitude'] = 0
df['Latitude'] = 0

# process all rows
for row in df.itertuples():
    try:
        data = geolocator.geocode(row[7])
        df.at[row.Index, 'Longitude'] = data.point.longitude
        df.at[row.Index, 'Latitude'] = data.point.latitude
    except AttributeError:
        print('error')

# write to output
df.to_csv(r"C:\Users\Desktop\LatLongTest_2.csv", sep=";", index=False)

CodePudding user response:

With this method I'll show, you'll need to create two list, one with the Latitude and another one with the Longitude and later append as a Column to your DataFrame. Just let me remind you that they must have the same amount of rows, or it will give an error.

import pandas as pd

df = pd.read_csv(r"C:\Users\Desktop\LatLongTest.csv")
add_to_df = {'Latitude': LatitudeList, 'Longitude': LongitudeList}
df = df.append(pd.DataFrame(add_to_df))
df

It is also possible to do it like this, adding a value to each row:

import pandas as pd

df = pd.read_csv(r"C:\Users\Desktop\LatLongTest.csv")
df.loc[df.Adress == '54298 Boca Chica Blvd Brownsville, TX', "Latitude"] = Latitude
df.loc[df.Adress == '54298 Boca Chica Blvd Brownsville, TX', "Longitude"] = Longitude
df

You can check out more resources at this link

  • Related