I'm trying to scrape really hectic twitch chats for keywords but sometimes the socket stops for a split second, but in that split second, 5 messages can go by. I thought of implementing some multithreading but no luck in the code below. It seems like they all fail to catch a keyword, or all succeed. Any help is appreciated. Code below:
import os
import time
from dotenv import load_dotenv
import socket
import logging
from emoji import demojize
import threading
# loading environment variables
load_dotenv()
# variables for socket
server = "irc.chat.twitch.tv"
port = 6667
nickname = "frankied003"
token = os.getenv("TWITCH_TOKEN")
channel = "#xqcow"
# creating the socket and connecting
sock = socket.socket()
sock.connect((server, port))
sock.send(f"PASS {token}\n".encode("utf-8"))
sock.send(f"NICK {nickname}\n".encode("utf-8"))
sock.send(f"JOIN {channel}\n".encode("utf-8"))
while True:
consoleInput = input(
"Enter correct answer to the question (use a ',' for multiple answers):"
)
# if console input is stop, the code will stop ofcourse lol
if consoleInput == "stop":
break
# make array of all the correct answers
correctAnswers = consoleInput.split(",")
correctAnswers = [answer.strip().lower() for answer in correctAnswers]
def threadingFunction():
correctAnswerFound = False
# while the correct answer is not found, the chats will keep on printing
while correctAnswerFound is not True:
while True:
try:
resp = sock.recv(2048).decode(
"utf-8"
) # sometimes this fails, hence retry until it succeeds
except:
continue
break
if resp.startswith("PING"):
sock.send("PONG\n".encode("utf-8"))
elif len(resp) > 0:
username = resp.split(":")[1].split("!")[0]
message = resp.split(":")[2]
strippedMessage = " ".join(message.split())
# once the answer is found, the chats will stop, correct answer is highlighted in green, and onto next question
if str(strippedMessage).lower() in correctAnswers:
print(bcolors.OKGREEN username " - " message bcolors.ENDC)
correctAnswerFound = True
else:
if username == nickname:
print(bcolors.OKCYAN username " - " message bcolors.ENDC)
# else:
# print(username " - " message)
t1 = threading.Thread(target=threadingFunction)
t2 = threading.Thread(target=threadingFunction)
t3 = threading.Thread(target=threadingFunction)
t1.start()
time.sleep(.3)
t2.start()
time.sleep(.3)
t3.start()
time.sleep(.3)
t1.join()
t2.join()
t3.join()
CodePudding user response:
First, it makes not much sense to let 3 threads in parallel read on the same socket, it only leads to confusion and race conditions.
The main problem though is that you are assuming that a single recv
will always read a single message. But this is not how TCP works. TCP has no concept of a message, but only is a byte stream. A message is an application level concept. A single recv
might contain a single message, multiple messages, parts of messages ...
So you have to actually parse the data you get according to the semantics defined by the application protocol, i.e.
- initialize some buffer
- get some data from the socket and add them to the buffer - don't decode the data
- extract all full messages from the buffer, decode and process each of the message separately
- leave remaining incomplete messages in the buffer
- continue with #2
Apart from that don't blindly throw away errors during recv(..).decode(..)
. Given that you are using a blocking socket recv
will usually only fail if there is a fatal problem with the connection, in which case a retry will not help. The problem is most likely because you are calling decode
on incomplete messages which might also mean invalid utf-8 encoding. But since you simply ignore the problem you essentially lose the messages.