im using sqlite3 to store data from a pandas dataframe
i Scrape data from Twitter and i want it to be every hour
and to do that i want to filter the date in the database the date format is like this :
2021-11-11 08:07:33 00:00
the query im using :
cur.execute("SELECT * FROM tweets_b_db WHERE tweet_created_at > " li " ")
li is a variable last inserted date before filling the db again
cur.execute("SELECT tweet_created_at FROM tweets_b_db ORDER BY tweet_created_at DESC LIMIT 1")
li = cur.fetchone()
What it returns :
can only concatenate str (not "tuple") to str
my code :
import tweepy
import time
import datetime
import pandas as pd
import sqlite3
con = sqlite3.connect('tweetScaping.db')
cur = con.cursor()
consumer_key = "**********************"
consumer_secret = "****"
access_token = "****-*****"
access_token_secret = "***************"
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True)
# using tweepy to search for the keyword Bouygues telecom
text_query = 'bouygues telecom OR @bouyguestelecom OR #Bouygues'
count = 200
try:
# Creation of query method using parameters
tweets = tweepy.Cursor(api.search_tweets, q=text_query " -filter:retweets").items(count)
# Pulling information from tweets iterable object
tweets_list = [[tweet.created_at, tweet.id, tweet.text] for tweet in tweets]
# Creation of dataframe from tweets list
# Add or remove columns as you remove tweet information
# tweets_df = pd.DataFrame(columns=['tweet_created_at', 'tweet_id', 'tweet_text'])
tweets_df = pd.DataFrame(tweets_list)
tweets_df.columns = ['tweet_created_at', 'tweet_id', 'tweet_text']
#last inserted
cur.execute("SELECT tweet_created_at FROM tweets_b_db ORDER BY tweet_created_at DESC LIMIT 1")
li = cur.fetchone()
# to insert results to database (sqlite3)
tweets_df.to_sql(name='tweets_b_db', con=con, if_exists='replace')
# to show table content
cur.execute("SELECT * FROM tweets_b_db WHERE tweet_created_at > " li " ")
print(cur.fetchall())
except BaseException as e:
print('failed on_status,', str(e))
time.sleep(3)
Update : using :
cur.execute("SELECT tweet_created_at FROM tweets_b_db ORDER BY tweet_created_at DESC LIMIT 1")
data = cur.fetchone()
data = data[0]
cur.execute("SELECT * FROM tweets_b_db WHERE tweet_created_at >= Datetime('{data}')")
print(cur.fetchall())
returns nothing :
[]
would be very helpful if someone can guide me to the right direction
CodePudding user response:
fetchone() returns a tuple that contains all of your requested columns in the query. in your case, there's a single column (tweet_created_at), so a single element (value of tweet_created_at) will be present in your tuple, which can be accessed at index 0.
li = cur.fetchone()
li = li[0]
CodePudding user response:
If you use an f-string for the sql statement:
cur.execute(f"SELECT * FROM tweets_b_db WHERE tweet_created_at >= Datetime('{data}')")
I'm sure your code will work.
But, the recommended way to pass parameters is with ?
placeholders:
cur.execute("SELECT * FROM tweets_b_db WHERE tweet_created_at >= Datetime(?)", (data,))
Also, if data
has the correct datetime format yyyy-mm-dd hh:MM:ss
then you don't need the function DATETIME()
:
cur.execute("SELECT * FROM tweets_b_db WHERE tweet_created_at >= ?", (data,))