Home > Blockchain >  Python(Pandas) filtering large dataframe and write multiple csv files
Python(Pandas) filtering large dataframe and write multiple csv files

Time:08-26

I have this following data frame, I'm constructing a Python function(to use it in Labview) that basically only does: data pair & data cleaning.

The data frame is like this:

enter image description here

I need pandas to pick out each column(except 'Date') individually and pair it with 'Date'(customized index), before writing separately into individual CSV files, I need to make sure the Pressure column data does not contain any '0' number, and for each temperature columns, data that are equal to 0 or bigger than 150 will be filtered out.

The following is my Python function, parameters x1 and x2 will be fed through LabVIEW input to specify a user-selected "date range".

def data_slice(x1, x2):

    import pandas as pd

    df = pd.read_csv('exp_log.csv')

    df.set_index('Date', inplace=True)

    df_p = df.loc[x1:x2, 'Pressure']

    filt = (df_p['Pressure'] == 0)
    df_p = df_p.loc[~filt]

    df_p.to_csv('modified_pressure.csv', index=True)


    all_cols = list(df.columns)
    temp_cols = all_cols[1:]

    for i in temp_cols:
        df_i = df.loc[x1:x2, 'i']
        filt = (df_i > 150) | (df_i == 0)
        df_i = df_i.loc[~filt]
    df_i.to_csv(f'modified_temp{i}.csv', index=True)

My question will be....will this piece of Python code actually work properly? aka, to write out individual CSV files efficiently?? Given the fact that the actual exp_log.csv file is a super large file containing data logged for days....

CodePudding user response:

Not exactly. Your last loop will not work. "df_i" will not be evaluated the way you want. Also, the df.loc[x1:x2,'i'] will not evaluate to the column you want it to. The first part, until the first .to_csv() should work fine.

CodePudding user response:

It will work with your command df_i.to_csv(f'modified_temp{i}.csv', index=True). Except for the fact that this line is outside your for-loop. It's missing indentation.

Besides I would recommend to separate responsibilities. So I mean split this function in multiple functions, each with its own purpose like importing data, saving data, manipulating data ect. Try to keep one level of abstraction per function.

Lastly, don't not import libraries within the function.

  • Related