Context: I'm retrieving stock data from the MySQL database (such as Date, Open, Low, Close and High prices) and I have some duplicates on the table. To avoid this, I'm trying to use the DISTINCT
in my query, however, I keep getting some duplicates on the Date
. I believe this is happening because while some of the Date
is the same, the Open, High, Low, and Close prices are not, therefore I keep getting duplicate dates regardless.
I've tried using UNION
to get the distinct dates and then use 2nd select on my query to select the open, high, low, and close prices, but no success:
(SELECT DISTINCT DATE)
UNION
(SELECT Open, High, Low, Close
FROM symbolsv2
WHERE ticker='AAPL'
AND `intval`='1d'
AND Open
AND High
AND Low
AND Close IS NOT NULL
ORDER BY DATE)
This is what is happening with some of the data I have on my MySQL:
As you can see, the Dates are the same, but the prices are different (this is related to the web scraping that I'm doing that can glitch from time to time and retrieve the prices for the same date, but at different periods of the day).
This is my original SQL query:
if (isset($_POST['submit'])) {
$ticker = $_POST['ticker'];
$intval = $_POST['intval'];
$Date = $_POST['date'];
$stmt = $conn->prepare("SELECT DISTINCT Date, Open, High, Low, Close
FROM symbolsv2
WHERE ticker=?
AND `intval`=?
AND Date>=?
AND Date
AND Open
AND High
AND Low
AND Close IS NOT NULL
ORDER BY Date");
$stmt->bind_param("sss", $ticker, $intval, $Date);
$stmt->execute();
$result = $stmt->get_result();
while ($data = $result->fetch_assoc()) {
$data_array[] = $data;
}
$stmt->close();
}
$conn->close();
?>
How could I do it so that I no longer have this dulpicates from Date
? I don't keeping the 1st record for example (i.e. keep the 76th index and remove the 77th, for example).
I found this SO - SQL: Two select statements in one query, but I couldn't get it to work
Thank you in advance, any help is welcome (I'm fairly new to SQL)
CodePudding user response:
I guess you want to consolidate any duplicate records you have. You can use this GROUP BY query.
SELECT Date, ticker, intval,
AVG(Open) AS Open,
MAX(High) AS High,
MIN(Low) AS Low,
AVG(Close) AS Close
FROM symbolsv2
WHERE ticker=?
AND `intval`=?
AND Date >= ?
GROUP BY Date, ticker, intval;
This will yield one row for each distinct combination of Date, ticker, and intval. It handles duplicate rows possibly with different data by averaging the multiple open and close prices, and taking the highest of the high and lowest of the low prices. (How you want to handle these duplicates with different values is up to you, of course. I just guessed.)
(UNION
doesn't work the way you think it does. It concatenates two result sets.)