Let me explain a little bit more in detail...
I have a mysql table like this:
| session_id | session_start | session_end |
|------------|---------------------|---------------------|
| 1 | 2016-01-01 01:00:00 | 2016-01-01 01:03:00 |
| 2 | 2016-01-01 03:00:00 | 2016-01-01 03:02:24 |
| 3 | 2016-01-01 03:35:00 | 2016-01-01 03:37:02 |
| ... | ... | ... |
Where the time interval between session_start
and session_end
never overlaps for different rows. Each session refers to a chat conversation where messages are interchanged between two users.
I also have a mongoDB where, for each document, I store text messages with a date/timestamp of when it was received, call it event_date
. But, the document doesn't tell which session the message belongs to. So, the only way we can match a mongoDB document to a session in the mysql table is if event_date
is between session_start
and session_end
for some record in mysql and then retreive the session_id
.
My current solution is running a query like this (I'm working in python with mysql library for this), in pseudo-code:
for event_date in all_event_dates:
cursor.execute("SELECT session_id
FROM session_table
WHERE %s >= session_start
AND %s <= session_end",
(event_date, event_date))
row = cursor.fetchone()
But this is a very slow process since the length of the list all_event_dates
is in the order of 100k. Is there a more efficient query where I can run it for multiple event_date
values at a time?
Thanks a lot in advance
CodePudding user response:
Load all the event dates into a temporary table, then join with that.
cursor.execute("CREATE TEMPORARY TABLE event_dates (event_date DATETIME)";
cursor.executemany("INSERT INTO event_dates VALUES (?)", [(date,) for date in all_event_dates])
connection.commit()
cursor.execute("""
SELECT session_id
FROM session_table AS s
JOIN event_dates AS e ON e.event_date BETWEEN s.session_start AND s.session_end""")
rows = cursor.fetchall()
CodePudding user response:
Because they don't overlap, do
SELECT session_id
FROM session_table
WHERE %s >= session_start
ORDER BY session_start
LIMIT 1
and have this on the table:
INDEX(session_start, session_id)