I want to use SQL to analyze this call data I have. The main question I want to answer is: If a caller dials a number and doesn't get a response the first time, does the value of some_factor
(boolean column) impact the probability of the caller getting a response when redialing the same number later?
Here are the columns in this table:
| id | phone_number | callee_name | caller_id | call_duration | some_factor | created_at |
|----|--------------|-------------|-----------|---------------|-------------|---------------------|
| 1 | 123-456-7890 | John Smith | 42 | | t | 2022-03-07 09:40:40 |
Here was what I arrived at after some thinking, but I'm relatively inexperienced with SQL queries. If they worked, I could look at the ratio of the count from query 1 compared to the count from query 2, and then redo the queries where some_factor
is false.
-- 1. successful retried calls: the callee picked up
SELECT COUNT(*)
FROM phone_calls
WHERE phone_number IN (
SELECT phone_number
FROM phone_calls
WHERE call_duration IS NULL
AND some_factor
)
AND call_duration IS NOT NULL
-- 2. unsuccessful retried calls: the callee did not pick up
SELECT COUNT(*)
FROM phone_calls
WHERE phone_number IN (
SELECT phone_number
FROM phone_calls
WHERE call_duration IS NULL
AND some_factor
)
AND call_duration IS NULL
-- then, do math with the results of these queries to get success rate when some_factor is true
-- then, redo but with AND NOT some_factor to get success rate when some_factor is false
But, I realized that 2 is counting entries from the subquery in the total count. I want the COUNT to not include the first time that someone called a number, if they did not pick up. Maybe if I can exclude entries with an id
that corresponds to entries from the subquery, that will work. But I can't figure out how to do that. Am I missing something?
CodePudding user response:
As I understand in the second query you're not interested in the results returned by the subquery itself. So the count is as a result larger by the amount of distinct phone numbers returned by the subquery. You only need to subtract that number:
SELECT COUNT(*) - COUNT(DISTINCT phone_number)
FROM phone_calls
WHERE phone_number IN (
SELECT phone_number
FROM phone_calls
WHERE call_duration IS NULL
AND some_factor
)
AND call_duration IS NULL