merge multiple tables together-CodePudding

i seem to be a little confused on what to do. I have 3 csv files, each have the the common rows of name age and sex. First csv file shows the common data of the 3 and proceeds to have card details belonging to each user, 2nd csv file contain the same 3 common details and proceeds to contain vehicle details while the third csv file is the same and then proceed to have extra details of marital status. Now i want to upload all three in a single table to be queried on mysql with python for a project. Would it be wise of me to upload all three tables and use a join query to join all three tables or do i find a python code to merge all three tables and upload on the database. I am pretty new to this so if you can explain simple for me that would be appreciated.

if i'm gonna use a python code to merge them any ideas what would work ?

first_Name,Last_Name,Age,Sex,Vehicle_make,Vehicle_model,Vehicle_year,Vehicle_type

firstName,lastName,age,iban,credit_card_number,credit_card_security_code,credit_card_start_date,credit_card_end_date,address_main,address_city,address_postcode

firstName,lastName,age,sex,retired,dependants,marital_status,salary,pension,company,commute_distance,address_postcode

here is what each row looks like and the need to merge them all. I'd also need to find a way to sort out which i could use a primary key. i have about 3-4 rows that will have identical details so i cant really say which i could use.

any ideas would be great.

CodePudding user response：

Try use Pandas to merge your data like this:

import pandas as pd

# read each file to Pandas DataFrame
df1 = pd.read_csv('file1.csv')
df2 = pd.read_csv('file2.csv')
df3 = pd.read_csv('file3.csv')

# concatenate frames with equal structure
df = pd.concat([df1, df3])

# merge frames by common columns
df_merged = df.merge(df2, how='outer', on=['firstName', 'lastName', 'age'])
df_merged.to_csv('merged.csv', sep=',')

You can not only save merged frame to csv but directly to database with df.to_sql() method. Also it is possible to manipulate with your data as you wish.

CodePudding user response：

Try to solve your problem using standard Python tools

import csv


files= ['file1.csv', 'file2.csv', 'file3.csv']

rows = list()
column_names = list()

"""
Add each row of file to list as list of dictionaries,
e.g. [{'field1': 'value'}, {'field2': 'value'}, ...]
and add column names from header of each file to another sequence
"""
for f in files:
    with open(f, newline='', encoding='utf8') as csvfile:
        reader = csv.DictReader(csvfile)
        for row in reader:
            rows.append(row)
        for item in reader.fieldnames:
            if item not in column_names:
                column_names.append(item)

"""
Write rows to new csv file from previously formed dictionaries
"""
with open('merged.csv', 'w', newline='', encoding='utf8') as csvfile:
    writer = csv.DictWriter(csvfile, fieldnames=column_names)
    writer.writeheader()
    writer.writerows(rows)

It's not so elegant but should works.