I'm using the Firebase Admin SDK inside a Firebase Functions to list the content of a directory on Cloud Storage. However my function takes quite often more than 5 seconds to answer. At first I thought this was all due to cold starts of the functions itself and I've tried numerous things to prevent cold starts of the function such as:
- Set
minInstances
to 1 or more - Use cronjob to call function every minute (without authentication so that it doesn't actually use the admin sdk but still keeps the function warm)
- Split functions into separate files to reduce cold start time (https://stackoverflow.com/a/47985480/13370504)
- Try moving function closer to cloud storage (
us-central1
toeurope-west3
)
However none of the above worked and I still get slow response times when the function is called for the first time. After adding some logging inside the function it turns out, that the Admin SDK is taking a over a second to get a single value from Realtime Database and it's taking usually over 5 seconds to run the getFiles
command on Cloud Storage (The directory usually only contains a single file).
For example for the code below I'm getting the following console outputs:
listVideos: coldstart true
listVideos: duration 1: 0ms
listVideos: duration 2: 1302ms (realtime database)
listVideos: duration 3: 6505ms (getFiles on cloud storage)
listVideos: coldstart false
listVideos: duration 1: 0ms
listVideos: duration 2: 96ms (realtime database)
listVideos: duration 3: 199ms (getFiles on cloud storage)
My function looks like this:
import * as admin from "firebase-admin";
admin.initializeApp();
let coldStart = true;
exports.listVideos = functions.region("europe-west1").runWith({
memory: "128MB",
minInstances: 1,
}).https.onCall(async (data, context) => {
console.log("coldstart", coldStart);
coldStart = false;
const t1 = new Date().getTime();
if (context.auth) {
const authUid = context.auth.uid;
console.log(`duration 1: ${new Date().getTime() - t1}ms`);
const level = (await admin.database().ref(`users/${authUid}/`).once("value")).val();
console.log(`duration 2: ${new Date().getTime() - t1}ms (realtime database)`);
const [files] = await admin.storage().bucket()
.getFiles({
prefix: `${authUid}/video`,
});
console.log(`duration 3: ${new Date().getTime() - t1}ms (getFiles on cloud storage)`);
return {status: "success", files: files};
} else {
return {status: "error - not authenticated"};
}
});
I know I can't expect 0ms latency but for a simple getFiles
call I'd expect something under 1 second just like it is when the sdk is "warm" (considering that my whole bucket has less than 1000 files and the directory that I'd listing has only 1 file in it)
CodePudding user response:
Cloud Storage isn't a database and is not optimized for queries (getFiles is effectively a query against the entire bucket using the prefix of the names of objects). It's optimized for storing and retrieving blobs of data at massive scale when you already know the name of the object.
If you want to list files quickly, consider storing the metadata of the files in a database that is optimized for the type of query that you want to perform, and link those records to your storage as needed. This is a fairly common practice in GCP projects.