Home > Blockchain >  Pandas/Python, Reading one row from multiple csv, and combining into 1 csv
Pandas/Python, Reading one row from multiple csv, and combining into 1 csv

Time:04-27

I'm fairly new to Python and Pandas,etc (go easy), and have spent a couple full days so far researching how to do this, with no success. Here is the project.

Two different .CSV's for 2 different data points are created with a running total for the day in each (ending with the day's total for that specific data in the last row). For example, in 4_23_2022_Data1.csv, has the columns Date and Data1, and in 4_23_2022_Data2.csv, has the columns Date and Data2, with the last row in each .csv containing that day's Date (obviously) and that Data's total (either Data1 or Data2 column).

What I'm trying to accomplish is: Read a directory containing the 2 sets of Data .csv's (for multiple, many days), and create a combined .CSV with 'Date', 'Data1', Data2' columns, with the set of Data1 and Data2 displayed in its appropriate row for that specific date. [The primary system creating these .csv's cannot combine Data1 and Data2 in 1 .csv initially, so I am trying to create a program that can quickly scan thru and do this.]

My initial coding attempts only scratch the surface (and do not account for any loop):

df_1 = pd.read_csv('04_23_2022_Data1.csv', usecols = ['Date','Data1'])

df_2 = pd.read_csv('04_23_2022_Data2.csv', usecols = ['Date','Data2'])

##the last line of the day is 538

Data1_date = df1_read.iloc[538]['Date']     

Data1_value = df1_read.iloc[538]['Data1']

Data2_date = df2_read.iloc[538]['Date'] 

Data2_value = df2_read.iloc[538]['Data2']

df_export = pd.DataFrame(columns = ['Date','Data1','Data2'])

df_export.at[1, 'Date'] = Data1_date

df_export.at[1, 'Data1'] = Data1_value

df_export.at[1, 'Data2'] = Data2_value

df_export.to_csv('Combo_of_Data1_and_Data2_per_Date.csv')

Any help or further guidance would be greatly appreciated. Again, sorry for the complete newbie attack at this problem, but I'm trying. :)

CodePudding user response:

You can loop through your files in your folder with glob.glob(). And use Python built-in functions iter() and zip() to iterate through your files in pairs.

# import
import glob

# define folder path
# look for files ending in '.csv' only
path = r'\data_folder\*.csv'

# loop through folder and store all relevant files as a list
file_paths = glob.glob(path)
# iterate through list to get elements in pairs
iterable_list = iter(file_paths)
# zip pairs of elements
iterable_list = zip(iterable_list, iterable_list)

# loop through list of files in pairs and do stuff...
for file_one, file_two in pair_elements(file_paths):
    df_1 = pd.read_csv(file_one)
    df_2 = pd.read_csv(file_two)
    # do stuff
  • Related