Home > Software engineering >  How to insert data in a dictionary inside a For-loop
How to insert data in a dictionary inside a For-loop

Time:04-03

I'm stuck in this part. I'm extracting data from reddit using PRAW, and I need to push all the data I extract into a dictionary and then, store the dict data into a PostgreSQL database, the for-loop works and extracts all the values I need but at the end only the last one is inserted in the dict. I tried using a dict of lists, but the same values are repeated several times. How can I insert all the data in my dict?. Also tested other solutions I found here, but just got an error. Here's my code:

class RedditExtract:
    def __init__(self, query, token):
        self.query = query
        self.token = token
        self.consulta = self.query.get("query")

    def searchQuery(self):
        reddit = praw.Reddit(
            client_id=REDDIT_CLIENT_ID,
            client_secret=REDDIT_CLIENT_SECRET,
            user_agent="extracting for reddit",
        )
        subreddit = reddit.subreddit("all").search(self.consulta)
        submission = reddit.submission
        top_subreddit = subreddit
        itemB = {}
        con = Conexion()
        for submission in top_subreddit:
            try:
                user = submission.author
                reditor = reddit.redditor(user)
                itemB["id"] = reditor.id
                print("id: "   itemB["id"])
                itemB["name"] = submission.fullname
                #print("name: "   itemB["name"])
                itemB["username"] = submission.author.name
                #print("username: "   itemB["username"])
                itemB["red"] = 13
                #print("red: "   str(itemB["red"]))
                itemB["type"] = "b"
                #print("type: "   str(itemB["type"]))
                itemB["karma"] = submission.author.total_karma
                #print("karma: "   str(itemB["karma"]))
                itemB["avatar"] = reditor.icon_img
                #print("url icon username: "   itemB["avatar"])
                itemB["extract_date"] = datetime.today().strftime("%Y-%m-%d %H:%M:%S")
                #print("extract date: "   itemB["extract_date"])
                itemB["created_at"] = datetime.fromtimestamp(int(submission.created_utc))
                #print("created at: "   str(itemB["created_at"]))
            except:
                print("No se hallo ID del usuario, se omite el post")

The prints are just to evaluate that PRAW extracts the data correctly.

PS: I use PRAW 7.5.0 and Pyhton 3.8 with PyCharm.

I tried using lists to store each key's value and then using the lists to create the dictionary, but just got the same values repeating several times. Also, tried to create another for to store keys and store values, but many values were missing. I want so have something like this: {'id':'kshdh''jajsjs''kasjs''asmjs'...,'name':'asrat''omes',...} And then, from that dictionary, insert in each column (key) the values (value) in a PostgreSQL database.

TABLE: I actually got a dict like this: {'id': 'ajsgs,jhfhd,ajddg,ahsgys,...','name':'maaa,nnn,...',...} but the BIG problem with that is all values are string and I need 'red' and 'karma' to be integers, and can't cast them once in the dict. My table in PostgreSQL is something like this:

CREATE TABLE IF NOT EXISTS public.salert_basic
(
    id character varying(255) COLLATE pg_catalog."default" NOT NULL,
    name character varying(255) COLLATE pg_catalog."default",
    username character varying(255) COLLATE pg_catalog."default",
    red integer,
    extract_date timestamp without time zone,
    created_at timestamp without time zone,
    karma integer,
    icon character varying COLLATE pg_catalog."default",
    type character varying COLLATE pg_catalog."default",
    CONSTRAINT salert_basic_pk PRIMARY KEY (id)
)

And the code to insert data from Pyhton is this:

        Conexion.con.autocommit = True
        curser = Conexion.cursor
        columns = itemB.keys()
        for i in itemB.values():
           sql = '''insert into salert_basic(id,name,username,red,type,karma,icon,extraction_date,created_at) values{};'''.format(i)
        curser.execute(sql)
        Conexion.con.commit()
        Conexion.con.close()

This is how I created my dict:

itemB = defaultdict(list)

Then. I fill it with this for each key:

itemB["name"].append(submission.fullname)

And finally, to concatenate the values of the list in the dict, I use this for:

  for key in itemB:
     itemB[key] = ", ".join(itemB[key])

But as I said, to do this, I cast my integers to strings, which can't put into my databse. What do you say? PS: How avoid duplicate primary key error? 'Cause there are some repeated ids.

UPDATE:

  1. I checked the use of %s, I forgot about it.
  2. Well... no, I need all the ids in "id" key but each one separated from the others, not like ahsgdshjgjsdgs....., also, id is a PK, so duplicates are not allowed but I think with a IN CONFLICT DO NOTHING in the sql I can avoid its insertion and continue with the others.
  3. Yeah, I try to insert each submission as a row in the database table, but it's giving me headaches.

CodePudding user response:

Still not exactly what you are trying to achieve. Here is a attempt at something that I think does what you want:

class RedditExtract:
    def __init__(self, query, token):
        self.query = query
        self.token = token
        self.consulta = self.query.get("query")

    def searchQuery(self):
        reddit = praw.Reddit(
            client_id=REDDIT_CLIENT_ID,
            client_secret=REDDIT_CLIENT_SECRET,
            user_agent="extracting for reddit",
        )
        subreddit = reddit.subreddit("all").search(self.consulta)
        submission = reddit.submission
        top_subreddit = subreddit
        data_list = []
        con = Conexion()
        for submission in top_subreddit:
            item_dict = {}
            try:
                user = submission.author
                reditor = reddit.redditor(user)
                item_dict["id"] = reditor.id
                item_dict["name"] = submission.fullname
                item_dict["username"] = submission.author.name
                item_dict["red"] = 13
                item_dict["type"] = "b"
                item_dict["karma"] = submission.author.total_karma
                item_dict["avatar"] = reditor.icon_img
                item_dict["extract_date"] = datetime.today().strftime("%Y-%m-%d %H:%M:%S")
                item_dict["created_at"] = datetime.fromtimestamp(int(submission.created_utc))
                data_list.append(item_dict)
            except:
                print("No se hallo ID del usuario, se omite el post")

sql = """insert into salert_basic
    (id, name, username, red, type, karma, icon,
    extraction_date, created_at) 
values
    (%(id)s, %(name)s,  %(username)s, %(red)s, %(type)s, %(karma)s, 
    %(icon)s, %(extraction_date)s, %(created_at)s)"""

curser = Conexion.cursor
curser.executemany(sql, data_list)

--If this is a large data set then it will perform better with

from psycopg2.extras import execute_batch

execute_batch(curser, sql, data_list)

The above:

  1. Creates a list of dicts
  2. Modifies sql to use named placeholders so the values in the dict can be mapped to a placeholder.
  3. Runs the sql in either executemany() or execute_batch()`. They will iterate over the list and apply the values in each dict to the placeholders in the query string.
  • Related