Create new dataframe based on condition-CodePudding

I have a series of csv files inside a directory. Each csv file has the following columns:

slotID; NLunUn; NLunTot; MeanBPM

I would like, starting from the values contained within the slotID column, to create data frames that contain the relative values. Eg the 1st csv has the following values:

slotID NLun An NLunTot MeanBPM
7 11 78 129,7
11 6 63 123,3
12 6 33 120,6
13 5 41 124,5
14 4 43 118,9

the 2nd csv has the following values

slotID NMarAn NMarTot MeanBPM
7 10 72 131,2
11 5 48 121,5
12 4 17 120,9
13 4 19 125,6
16 6 45 127,4

I would like to create a dataframe which for example is called dataframe1 which has the values of slot 7 inside, another csv which contains the values of slot 11 etc ... Any suggestion is welcome, I've been trying for several days but can't seem to jump out, please help me. This is what i've done so far:

import pandas as pd
#import matplotlib.pyplot as plt 
import os
import glob
import numpy as np

path = os.getcwd()
csv_files = glob.glob(os.path.join(path, "*.csv"))

for f in csv_files:
    dfDay = pd.read_csv(f, encoding = "ISO-8859-1", sep = ';')
    //inside dfday there are all the files that contain the data

CodePudding user response：

Provided that all the csv-files have the same structure (i.e. column names) you could do something like this:

...

path = os.getcwd()
csv_files = glob.glob(os.path.join(path, "*.csv"))

df = pd.concat(
    (pd.read_csv(f, encoding='ISO-8859-1', sep=';') for f in csv_files),
    ignore_index=True
)
slot_dfs = {slot: group for slot, group in df.groupby("slotID")}

# Exporting to csv-files
for n, df_slot in enumerate(slot_dfs.values(), start=1):
    df_slot.to_csv(f"dataframe{n}.csv", index=False)

The dictionary slot_dfs contains the dataframes for each available slot.

If you really want to create variables for the dataframes then you could try

for n, (_, group) in enumerate(df.groupby("slotID"), start=1):
    globals()[f"dataframe{n}"] = group
    # Exporting to csv-file
    group.to_csv(f"dataframe{n}.csv", index=False)

instead of creating the slot_dfs dictionary. After that print(dataframe1) should show the dataframe for the first slot etc.