I need to filter my data for specific latitude and longitude from a text file using python syntax-CodePudding

I have gridded emission .txt files which are in multiple folders, each .txt file has more than 1 million rows so excel can't open it and I need to filter data in each .txt file for specific latitude and longitude. Here I want to give a path for the folder so python can read each file from a folder and filter the data for the specific lat long

import pandas as pd

with open(r"C:\Users\VOC_spec_.txt",'r') as f2:
    data = f2.read()
    df = pd.DataFrame(columns=['lat', 'lon', 'emission'])  # create the dataframe
    df_sub = subset(df, Lat >= 28 & Lat <= 33 & Long >= 75 & Long <= 97)

this gives an error name 'subset' is not defined

Does anyone know how to write it efficiently and save in .txt format. It will be great help

CodePudding user response：

subset is a function from R. Pandas has some options as query:

df_sub=df.query('lat >= 28 & lat <= 33 & lon >= 75 & lon <= 97')

Or you can also use boolean indexing:

df_sub=df[(df['lat']>=28) & (df['lat']<=33) &(df['lon']>=75) &(df['lon']<=97) ]

CodePudding user response：

You can use the following code to filter a txt file using pandas. Note: you should define the delimiter correctly! it can be a whitespace or a comma or other things.

import pandas as pd
import os

delimiter='\t'
for filename in os.listdir("C:\Users"):
    if filename.endswith(".txt"):
        file_path = os.path.join(folder_path, filename)
        df = pd.read_csv(file_path, delimiter=delimiter)
        df_sub = df[(df['Lat'] >= 28) & (df['Lat'] <= 33) & (df['Long'] >= 75) & (df['Long'] <= 97)]

You can then do something with df_sub here, like saving it to a new file. I would suggest write them as a csv file:

        new_file_path = os.path.join(folder_path, "filtered_" filename[:-4] ".csv")
        df_sub.to_csv(new_file_path, sep=',', index=False)

CodePudding user response：

You can easily do it without Dataframes:

with open(r"C:\Users\VOC_spec_.txt", "r") as file:
    data = file.read().strip(" \n").split("\n")

filtered = []
for line in data:
    l = line.split(',')
    lat, long, em = float(l[0].strip()), float(l[1].strip()), float(l[2].strip())
    if lat >= 28 and lat <= 33 and long >= 75 and long <= 97:
        filtered.append([lat, long, em])

Now filtered only contains the matching lines. Note that I split each line with ',' assuming your CSV is comma-delimited. Replace that with the delimiter matching your file.