Home > database >  Is there a better/faster way to do these multiple inserts using PHP and MariaDB?
Is there a better/faster way to do these multiple inserts using PHP and MariaDB?

Time:01-14

So I'm trying to build a small database to track high scores on the pinball machines in our league. I have one table for users which is just an AI id column, and a column with their email address. Then I have a games table which is an AI id column, and the names of the machines. Since it is a many-to-many relationship, I made a third table called scores which has the user_id, game_id, and score as columns.

EDIT: Including read code:

$file = fopen('scores.txt', 'r') or die("Unable to open file.");

// Loop through the file line by line
$line_number = 1;
while (($line = fgets($file)) !== false) {

    // Reset flags
    $email_exists = 0;
    $game_exists = 0;
    if (isset($email_id)) unset($email_id);
    if (isset($game_id)) unset($game_id);
    echo ($line_number . " ");

    // Split the line into components
    $line = rtrim($line);
    $array = explode(",", $line, 3);
    $email = strtolower($array[0]);
    $game = $array[1];
    $score = $array[2];
    $stmt = $db->prepare ("SELECT email_id FROM users WHERE email_address = ?");
    $stmt->execute(array($email))
        if ($stmt->rowCount() < 1) {
        $stmt = $db->prepare("INSERT INTO users (email_address) VALUES (?)");
        $stmt->execute(array($email))
        $email_id = $db->lastInsertId();
    } else {
        $row = $stmt->fetch();
        $email_id = $row['email_id'];
        $email_exists = 1;
    }

I use similar code to check if I already have a game listed in the database. I need to get the id numbers for both the game and the email address for the third part which is seeing if the user already has a score for that game, and if the new score is higher if they already do.

    if ($email_exists == 0 || $game_exists == 0) {
        // New user or game added to DB - no prior score can exist
        $stmt = $db->prepare("INSERT INTO scores VALUES (?,?,?)");
        $stmt->execute(array($email_id,$game_id,$score));
    } else {
        $stmt = $db->prepare("SELECT * FROM scores WHERE email_id = ? AND game_id = ?");
        $stmt->execute(array($email_id,$game_id));
        if ($stmt->rowCount() == 0) {
            // No previous score for this game
            $stmt = $db->prepare("INSERT INTO scores VALUES (?,?,?)");
            $stmt->execute(array($email_id,$game_id,$score));
        } else {
            // Previous score exists
            $row = $stmt->fetch();
            if ($score > $row['score']) {
                // New score is higher
                $stmt = $db->prepare("UPDATE scores SET score = ? " .
                    . "WHERE email_id = ? AND game_id =?");
                $stmt->execute(array($score, $email_id, $game_id));
                // No action needed if new score is lower
            }
        }
    }

The code seems to work fine, but it is slow. Plus, it appears to make the script time out or something after a few thousand records. Is there a better way to make this work?

I tried recoding it in Python but it was even slower and it appears it didn't even insert the rows in the database. It probably doesn't help that I barely know Python.

I was thinking of creating an array and storing the items that need to be inserted, then inserting 100 rows at a time or something like that, but I need to get the id's for the scores junction table. I also was considering using UNIQUE constraints in the database and am trying to figure out how to rewrite the insert code to use that to prevent duplicating email addresses or games.

CodePudding user response:

Create a stored procedure which has parameters email, game_id and score. Let the procedure do all SQL work for you. Your PHP code will be recuced to a sigle loop calling the proceudure. The result should be faster and far easier to maintain:

create procedure sp_add_email_score (
in_email varchar(320),
in_game_id int,
in_score int
)
begin

declare v_email_id int;

select email_id into v_email_id
from users 
where email_address = in_email;

if (v_email_id is null) then
  INSERT INTO users (email_address) VALUES (in_email);
  
  set v_email_id = LAST_INSERT_ID();
end if;

INSERT INTO scores (email_id, game_id, score) 
VALUES(v_email_id, in_game_id, in_score) 
ON DUPLICATE KEY UPDATE score=in_score;

end

If the loop is still too slow, then something else is causing the slowdown.

CodePudding user response:

Lots of room for improvement here. When it comes to database speed, your primary goal should generally be reducing the number of hits to the database server.

First, you're doing the email-to-id query for every single CSV row, there's no need for that. At most, you should be doing it once per user and caching it. Better, you can do it once for the whole set, reading the whole thing into an in-memory array. Something like:

$stmt = $db->prepare('SELECT email_address, email_id FROM users');
$idMap = array_column($stmt->execute(), 'email_id', 'email_address');

This will give you an array like:

[
    '[email protected]' => 1,
    '[email protected]' => 2,
]

Do this once at the start of the script and keep it in memory for the duration. From this, you can instantly look up an id given an email. This removes 7999 hits to the database. You're essentially trading memory for CPU and disk time here. If you encounter an email that isn't yet in the array, you can INSERT it and add it to the array.

Next, move your prepares outside the loop iteration. This removes at least 3 * 7999 hits to the database and possibly as many as 5 * 7999 hits.

Next, use fgetcsv() instead of explode() because it's easier and will properly handle quoting. And process the whole CSV before doing a single insert. It's silly to create this enormous amount of database traffic if you're just going to throw away most of the records. So calculate the highest first, and then only hit the database with those:

$top = [];
$fp = fopen('scores.txt', 'r');
while ([$email, $gameId, $score] = fgetcsv($fp)) {
    if ($score > ($top[$email][$gameId] ?? 0)) {
        $top[$email][$gameId] = $score;
    }
}

Given an input file of:

[email protected],g1,3
[email protected],g1,1
[email protected],g2,2
[email protected],g1,4
[email protected],g2,5
[email protected],g2,6

This will yield an array of the highest scores for each user:

Array
(
    [[email protected]] => Array
        (
            [g1] => 3
            [g2] => 2
        )

    [[email protected]] => Array
        (
            [g1] => 4
            [g2] => 6
        )

)

Then, you can iterate over that array, and only perform inserts/updates based on those records. This will save two queries for every redundant CSV row.

foreach ($top as $email => $scores) {
    foreach ($scores as $gameId => $score) {
        // INSERT INTO scores ($idMap[$email], $gameId, $score)
    }
}
  • Related