How to concatenate csv files and continue the last row value?-CodePudding

I have multiple csv files which have case column which begins from 0.

I want to concatenate them by setting the last case value 1 as the beginning value of the next one.

I know I can create a for loop to read each csv file and add the last value to the case column in each loop.

import pandas as pd

# List of file names
file_list = ['file1.csv', 'file2.csv', 'file3.csv']

# Read the first file and store it in a DataFrame
df = pd.read_csv(file_list[0])

# Get the last value of the column that you want to continue
last_value = df.iloc[-1]['column_name']

# Loop through the remaining files
for file in file_list[1:]:
    # Read the file into a DataFrame
    df_temp = pd.read_csv(file)

    # Continue the last value from the previous file in the current file
    df_temp['column_name']  = last_value 1
    last_value = df_temp.iloc[-1]['column_name']

    # Concatenate the current file with the main DataFrame
    df = pd.concat([df, df_temp])

Is it possible to directly use something like pd.concat(map(pd.read_csv, file_list)?

CodePudding user response：

It'll be efficient to modify each file data, append to a Python list, then concat at the end instead of frequently concatenating:

import pandas as pd

# List of file names
file_list = ['file1.csv', 'file2.csv', 'file3.csv']

# Read the first file and store it in a DataFrame
data = []
last_value = 0

# Loop through files
for file in file_list:
    # Read the file into a DataFrame
    df_temp = pd.read_csv(file)

    # Continue the last value from the previous file in the current file
    df_temp['column_name']  = last_value
    last_value = df_temp.iloc[-1]['column_name']   1

    # different here: append the data
    data.append(df_temp)

df = pd.concat(data)

CodePudding user response：

One thing you can try as you mentioned is the map function

import pandas as pd

file_list = ['dum1.csv', 'dum2.csv', 'dum3.csv']
    
# Concatenate the CSV files into a single data frame
df_concatenated = pd.concat(map(pd.read_csv, file_list))

This however will not update column values. So you have to update them beforehand or after. I'm not too sure about the exact df structure you have but you can try:

import pandas as pd

# Initialize a counter for the case values
case_counter = 0

file_list = ['dum1.csv', 'dum2.csv', 'dum3.csv']

# Concatenate CSV files into a single data frame
df_concat = pd.concat([df.assign(case=df.case   case_counter) for df in map(pd.read_csv, file_list)])

Alternatively, you can also concatenate the dataframes with map and reset the index


df.reset_index(inplace=True)
df.rename(columns={'index': 'case'}, inplace=True)
df['case'] = range(df.shape[0])

This will modify the original DataFrame and reset its index, renaming the new index column to case, and filling it with a range of numbers from 0 to the number of rows. You can also do it on a separate column without using the index and not in place by creating a new data frame.