Home > front end >  How can I perform a calculation for values inside CSV?
How can I perform a calculation for values inside CSV?

Time:04-18

I have a 'large' CSV file that I would like perform a calculation for one of the 'values' in it.

Extract from csv:

id,active,source,SourceId,latlngs,type,area,fir,Designator,MultipleCode,Name
142,0,N,4204260,"-17652,-32400;-17656,-32229;-17762,-32230;-17730,-32400;",R,"",SBRJ,,"","Area 1              "
264040,0,N,10083080,"29900,8400;29900,10800;29650,10800;29650,8400;"," ","R  ","Area 2 "

and the files continues for many Mbytes...

I'm doing the following:

import csv

with open('example.csv', mode='r') as csv_file:
    csv_reader = csv.DictReader(csv_file)

    for row in csv_reader:
        print(row["Name"],end=',')
        f= row["latlngs"][0:-1].split(sep=';')
        for a in f:
            b= a.split()[0:]
            print (b)

as a result I get:

Area1 ,['-17652,-32400']
['-17656,-32229']
['-17762,-32230']
['-17730,-32400']
Area 2       ,['-17652,-32400']
['-17656,-32229']
['-17762,-32230']
['-17730,-32400']

Now that I have the latlngs next to the Names, I need to divide each one of the values within the list, for instance; [-17652,-32400'] / 600 to have -29.42 -54 or ['-17656,-32229'] / 600 to have -29.426 -53.715.

While doing this, is where I fail. I have tried many things, after the last line of code I added.

for a in f:
    a.split(',')[0:]
    x = a[0:].split(sep=',')
    try:
        y = int(x[0])/600
        z = int(x[1])/600
        print (y,z)
    except ValueError as e:
            print (e)

But there, the one that gets calculated is the very last latlngs from the CSV file.

Extract from the output:

Area 500,['32390,-8980']
['31920,-9230']
['31930,-9290']
['32510,-12220']
['33090,-18000']
['32510,-23780']
['31330,-29680']
['31330,-29700']
['30620,-32380']
['30620,-32380']
['31070,-32730']
['31530,-33060']
['32260,-30310']
['33480,-24220']
['33480,-24210']
['34090,-18090']
['34090,-17900']
['33480,-11780']
['33470,-11740']
['32870,-8720']
['32390,-8980']
53.983333333333334 -14.966666666666667
53.2 -15.383333333333333
53.21666666666667 -15.483333333333333
54.18333333333333 -20.366666666666667
55.15 -30.0
54.18333333333333 -39.63333333333333
52.21666666666667 -49.46666666666667
52.21666666666667 -49.5
51.03333333333333 -53.96666666666667
51.03333333333333 -53.96666666666667
51.78333333333333 -54.55
52.55 -55.1
53.766666666666666 -50.516666666666666
55.8 -40.36666666666667
55.8 -40.35
56.81666666666667 -30.15
56.81666666666667 -29.833333333333332
55.8 -19.633333333333333
55.78333333333333 -19.566666666666666
54.78333333333333 -14.533333333333333
53.983333333333334 -14.966666666666667

What I can see (in my limited view) is that I cannot understand why the value that gets calculated and displayed is the last one instead of each one of them.

CodePudding user response:

You can't divide a list: the division will operate on the list, not on the individual items inside the list. For that functionality, you would need to use NumPy (in which case, you may also start using Pandas for working with tabular data).

In pure Python, using the CSV modules, something like this can work:

import csv

with open('example.csv', newline='', encoding='utf-8') as fp:
    reader = csv.DictReader(fp)
    for row in reader:
        print(row['Name'], end=', ')
        coords = row['latlngs'][:-1].split(';')
        for coord in coords:
            print(coord)
            lat, lon = [int(item) for item in coord.split(',')]
            lat /= 600
            lon /= 600
            print(lat, lon)

You would then need to store each separate lat and lon (possibly in a list) for each row, then store that list of lats and lons in another structure for the whole file. Or write them directly to a new file.

CodePudding user response:

If you are at liberty to use the pandas library, you can try something like this.

from pandas import read_csv, DataFrame


def transform(lat_long: str):
    # -17652,-32400;-17656,-32229;-17762,-32230;-17730,-32400;
    raw = [ll.split(",") for ll in lat_long.split(';') if ll]
    new_lat_longs = [(float(lat) / 600, float(long) / 600) for lat, long in raw]
    new_lat_longs = ";".join([f"{lat}, {long}" for lat, long in new_lat_longs])
    return new_lat_longs


df: DataFrame = read_csv('sorted.csv', index_col=0, header=0)
df["new_latlngs"] = df["latlngs"].apply(transform)
df.to_csv('transformed_new.csv')

This basically, loads the CSV, applies a transformation function to every latlng column in all rows, and sets the value as a new column.

The new CSV will look like

id,active,source,SourceId,latlngs,type,area,fir,Designator,MultipleCode,Name,new_latlngs
142,0,N,4204260,"-17652,-32400;-17656,-32229;-17762,-32230;-17730,-32400;",R,,SBRJ,,,Area 1              ,"-29.42, -54.0;-29.426666666666666, -53.715;-29.60333333333333, -53.71666666666667;-29.55, -54.0"
264040,0,N,10083080,"29900,8400;29900,10800;29650,10800;29650,8400;", ,R  ,Area 2 ,,,,"49.833333333333336, 14.0;49.833333333333336, 18.0;49.416666666666664, 18.0;49.416666666666664, 14.0"
  • Related