I have multiple csv files which have case
column which begins from 0.
I want to concatenate them by setting the last case
value 1 as the beginning value of the next one.
I know I can create a for loop to read each csv file and add the last value to the case
column in each loop.
import pandas as pd
# List of file names
file_list = ['file1.csv', 'file2.csv', 'file3.csv']
# Read the first file and store it in a DataFrame
df = pd.read_csv(file_list[0])
# Get the last value of the column that you want to continue
last_value = df.iloc[-1]['column_name']
# Loop through the remaining files
for file in file_list[1:]:
# Read the file into a DataFrame
df_temp = pd.read_csv(file)
# Continue the last value from the previous file in the current file
df_temp['column_name'] = last_value 1
last_value = df_temp.iloc[-1]['column_name']
# Concatenate the current file with the main DataFrame
df = pd.concat([df, df_temp])
Is it possible to directly use something like pd.concat(map(pd.read_csv, file_list)
?
CodePudding user response:
It'll be efficient to modify each file data, append to a Python list, then concat at the end instead of frequently concatenating:
import pandas as pd
# List of file names
file_list = ['file1.csv', 'file2.csv', 'file3.csv']
# Read the first file and store it in a DataFrame
data = []
last_value = 0
# Loop through files
for file in file_list:
# Read the file into a DataFrame
df_temp = pd.read_csv(file)
# Continue the last value from the previous file in the current file
df_temp['column_name'] = last_value
last_value = df_temp.iloc[-1]['column_name'] 1
# different here: append the data
data.append(df_temp)
df = pd.concat(data)
CodePudding user response:
One thing you can try as you mentioned is the map
function
import pandas as pd
file_list = ['dum1.csv', 'dum2.csv', 'dum3.csv']
# Concatenate the CSV files into a single data frame
df_concatenated = pd.concat(map(pd.read_csv, file_list))
This however will not update column values. So you have to update them beforehand or after. I'm not too sure about the exact df structure you have but you can try:
import pandas as pd
# Initialize a counter for the case values
case_counter = 0
file_list = ['dum1.csv', 'dum2.csv', 'dum3.csv']
# Concatenate CSV files into a single data frame
df_concat = pd.concat([df.assign(case=df.case case_counter) for df in map(pd.read_csv, file_list)])
Alternatively, you can also concatenate the dataframes with map
and reset the index
df.reset_index(inplace=True)
df.rename(columns={'index': 'case'}, inplace=True)
df['case'] = range(df.shape[0])
This will modify the original DataFrame and reset its index, renaming the new index column to case, and filling it with a range of numbers from 0 to the number of rows. You can also do it on a separate column without using the index and not in place by creating a new data frame.