SQL Query to read from csv file-CodePudding

I have a csv that I want to query to get some data and use that in another Python program. The .csv field has a name, but not the user id. The sql table has the user id. I would like to read the name from the csv, query the sql table for the user id, and then write that to another .csv (or just get the data to use). An example of doing this with a prompt that I have:

ACCEPT askone CHAR PROMPT 'First Name: ';
ACCEPT asktwo CHAR PROMPT 'Last Name: ';

select user_id from test.sy_users
where sy_first_nm = '&&askone' and sy_last_nm = '&&asktwo';

This works, but I'm trying to do it from a csv file with around 40 or 50 users that I need to get their id's. I just want askone and asktwo to come from the csv file. It seems like it should be simple, but I have not found a solution that actually works

CodePudding user response：

You have two options:

Use UTL_FILE where you can read data from your file and then tokenize it to extract the data you need (keep in mind that this requires the file to be on the DB server)
You can try using SQL*Loader, which is usually my prefered choice, because it lets you define a control file, which configures how to parse the file and load it into a table. After that you can just do your processing by querying up the data from the table you loaded them into.

CodePudding user response：

There is a Jupyter notebook showing reading and writing CSV files in cx_Oracle in https://github.com/cjbj/cx-oracle-notebooks. The python-oracledb doc has the same load-from-CSV example. (python-oracledb is the new name for cx_Oracle).

Here are some generic reading and writing CSV examples that move data to and from database tables. Of course, once you have data in Python, you can do whatever you like with it, e.g. to use for your subquery, instead of writing it directly to a table.

If your CSV file looks like:

101,Abel
154,Baker
132,Charlie
199,Delta
. . .

and you have a table created with:

create table test (id number, name varchar2(25));

then you can load data into the table with an example like:

import oracledb
import csv

# CSV file
FILE_NAME = 'data.csv'

# Adjust the number of rows to be inserted in each iteration
# to meet your memory and performance requirements
BATCH_SIZE = 10000

connection = oracledb.connect(user="hr", password=userpwd,
                              dsn="dbhost.example.com/orclpdb")

with connection.cursor() as cursor:

    # Predefine the memory areas to match the table definition.
    # This can improve performance by avoiding memory reallocations.
    # Here, one parameter is passed for each of the columns.
    # "None" is used for the ID column, since the size of NUMBER isn't
    # variable.  The "25" matches the maximum expected data size for the
    # NAME column
    cursor.setinputsizes(None, 25)

    with open(FILE_NAME, 'r') as csv_file:
        csv_reader = csv.reader(csv_file, delimiter=',')
        sql = "insert into test (id, name) values (:1, :2)"
        data = []
        for line in csv_reader:
            data.append((line[0], line[1]))
            if len(data) % BATCH_SIZE == 0:
                cursor.executemany(sql, data)
                data = []
        if data:
            cursor.executemany(sql, data)
        connection.commit()

One example of writing query data to a CSV file is:

sql = """select * from all_objects where rownum <= 10000"""

with connection.cursor() as cursor:

    cursor.arraysize = 1000  # tune for performance

    with open("testwrite.csv", "w", encoding="utf-8") as outputfile:
        writer = csv.writer(outputfile, lineterminator="\n")
        results = cursor.execute(sql)
        writer.writerows(results)