Home > database >  I need to filter my data for specific latitude and longitude from a text file using python syntax
I need to filter my data for specific latitude and longitude from a text file using python syntax

Time:01-17

I have gridded emission .txt files which are in multiple folders, each .txt file has more than 1 million rows so excel can't open it and I need to filter data in each .txt file for specific latitude and longitude. Here I want to give a path for the folder so python can read each file from a folder and filter the data for the specific lat long

import pandas as pd

with open(r"C:\Users\VOC_spec_.txt",'r') as f2:
    data = f2.read()
    df = pd.DataFrame(columns=['lat', 'lon', 'emission'])  # create the dataframe
    df_sub = subset(df, Lat >= 28 & Lat <= 33 & Long >= 75 & Long <= 97)

this gives an error name 'subset' is not defined

Does anyone know how to write it efficiently and save in .txt format. It will be great help

CodePudding user response:

subset is a function from R. Pandas has some options as query:

df_sub=df.query('lat >= 28 & lat <= 33 & lon >= 75 & lon <= 97')

Or you can also use boolean indexing:

df_sub=df[(df['lat']>=28) & (df['lat']<=33) &(df['lon']>=75) &(df['lon']<=97) ]

CodePudding user response:

You can use the following code to filter a txt file using pandas. Note: you should define the delimiter correctly! it can be a whitespace or a comma or other things.

import pandas as pd
import os

delimiter='\t'
for filename in os.listdir("C:\Users"):
    if filename.endswith(".txt"):
        file_path = os.path.join(folder_path, filename)
        df = pd.read_csv(file_path, delimiter=delimiter)
        df_sub = df[(df['Lat'] >= 28) & (df['Lat'] <= 33) & (df['Long'] >= 75) & (df['Long'] <= 97)]

You can then do something with df_sub here, like saving it to a new file. I would suggest write them as a csv file:

        new_file_path = os.path.join(folder_path, "filtered_" filename[:-4] ".csv")
        df_sub.to_csv(new_file_path, sep=',', index=False)

CodePudding user response:

You can easily do it without Dataframes:

with open(r"C:\Users\VOC_spec_.txt", "r") as file:
    data = file.read().strip(" \n").split("\n")

filtered = []
for line in data:
    l = line.split(',')
    lat, long, em = float(l[0].strip()), float(l[1].strip()), float(l[2].strip())
    if lat >= 28 and lat <= 33 and long >= 75 and long <= 97:
        filtered.append([lat, long, em])

Now filtered only contains the matching lines. Note that I split each line with ',' assuming your CSV is comma-delimited. Replace that with the delimiter matching your file.

  • Related