Home > OS >  Generate SQL Statements from CSV in Python (Dictionaries, JSON, List)
Generate SQL Statements from CSV in Python (Dictionaries, JSON, List)

Time:10-16

The idea is to read in a CSV File a table like the following:

enter image description here

DATABASE,SCHEMA,TABLE,COLUMN,TYPE,LENGTH
D1,S1,T1,C1,NUMBER,5
D1,S1,T1,C2,VARCHAR,25
D1,S1,T2,C1,NUMBER,5
D1,S1,T2,C2,VARCHAR,25
D1,S2,T1,C1,NUMBER,5
D1,S2,T1,C2,VARCHAR,25
D1,S2,T2,C1,NUMBER,5
D1,S2,T2,C2,VARCHAR,25

And in python create the corresponding query in a string variable, for example:

create table <s1>.<t1> (c1 number(5), c2 varchar(25));
create table <s1>.<t2> (c1 number(5), c2 varchar(25));
create table <s2>.<t1> (c1 number(5), c2 varchar(25));
create table <s2>.<t2> (c1 number(5), c2 varchar(25));

My idea is to store the table in a json array or I can use pandas, but I don't know what would be the simplest way to carry it out according to the iterations to be used

CodePudding user response:

Assuming your DataFrame is df, use:

groups = df.groupby(["SCHEMA", "TABLE"]).agg({"COLUMN": list, "TYPE": list, "LENGTH": list})
for (schema, table), row in groups.iterrows():
    record = zip(row["COLUMN"], row["TYPE"], row["LENGTH"])
    columns = ", ".join(f"{column} {type_}({length})" for column, type_, length in record)
    sentence = f"create table <{schema}>.<{table}> {columns};"
    print(sentence)

Output

create table <S1>.<T1> C1 NUMBER(5), C2 VARCHAR(25);
create table <S1>.<T2> C1 NUMBER(5), C2 VARCHAR(25);
create table <S2>.<T1> C1 NUMBER(5), C2 VARCHAR(25);
create table <S2>.<T2> C1 NUMBER(5), C2 VARCHAR(25);
  • Related