Home > Blockchain >  Is there a more efficient way to search files in subfolders in Google Drive?
Is there a more efficient way to search files in subfolders in Google Drive?

Time:12-01

Every time a user registers themselves on my page, a Google Drive folder is created. Afterwards the user can upload 2 videos, a bio video and a music cover video. If they upload a Bio video, a Bio subfolder is created inside the user folder. If they upload a cover video, a cover subfolder (round1) is created in the user folder. In other words I will have something like this:

User1 Folder
-bio folder
-round1 folder

User2 Folder
-round1 folder
-cover folder

At the same time, I have a google sheet that stores each user registration and gives me the link of the user folder IF the user uploaded something on the round1 folder (music cover video). The problem is that I have lots of user folders and my code can't search them all before dying. Is there a way to fix it?

I want my code to tell me if a user has uploaded a video in his round1 folder, and if he did, to give the link to the user folder.

I have something like this:

for(var i = iterador; i < data.length; i  ){
 var row = data[i];//each row of my google sheet
 var generado = row[colState];//this columns says if user updated something or not
 var folderName = row[colEmail];//the user email is the name of the user folder
 if(generado != FLAG){ //if the state is different from uploaded then...
  //folderID is the master folder who has all the user folders
  let check = getFileURL(folderName, folderID);//check if user folder exists, returns user folder
  if(check){
   let r1 = listFolders(check);//check if round1 folder exists, return round1 folder
   if(r1){
     s.getRange(i 2, colURL 1).setValue(check.getUrl());//sets the user folder link
     s.getRange(i 2, colState 1).setValue(FLAG);//sets the state to uploaded
     console.log(check.getName());
   }else{
     s.getRange(i 2, colState 1).setValue("No ha subido archivo por ahora");//set state to not upoloaded yet
     console.log(`no hay round 1 ${check.getName()}`);
   }
  }
 }
}

function getFileURL(folderName, folderID){
  const theFolder = DriveApp.getFolderById(folderID);//parent folder
  const folders = theFolder.getFolders();

  while (folders.hasNext()) {
    let folder = folders.next();
    let fname = folder.getName();
    let newtxt = fname.split('(');
    let compare = newtxt[1].split(')')[0];//email
    //if email/user folder exists...
    if(compare == folderName){
      return folder
    }
  };
  return null;
}
function listFolders(folder){
  let innerFolders = folder.getFolders();
  while(innerFolders.hasNext()){
    let folder = innerFolders.next();
    let fname = folder.getName();
    if(fname == "round1"){
      return folder;
    }
  }
  

the problem with this code is that it is too slow, and not all round1 folders have videos (I believe this happens when there is a problem uploading the video, so the folder is created but the video is not uploaded) is there a way to fix it? I have around 900 folders and I'm a beginner when it comes to coding that's why in my for I have a variable called iterador which I have been manually changing in order to check new rows (it can do around 200 rows simultaneously)

I will really appreciate the help or any insight you may have

EDIT: This is a sample folder and sample spreadsheet to see how my data is organized Test Folder

Also, the variables that are missing are as follows:

  var s = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("Unique");
  let header = s.getRange(1, 1, 1, s.getLastColumn()).getValues();
  header = header[0];

  let colEmail = header.indexOf("Email");
  let colState = header.indexOf("Archivo");
  let colURL = header.indexOf("Link");
  const FLAG = "Subido";

  let nrows = s.getRange(1, colEmail 1, s.getLastRow()).getValues().filter(String).length;
  let data = s.getRange(1, 1, nrows, s.getLastColumn()).getValues(); //Tabla de datos
  data.shift();
  iterador = data.length - Math.floor(data.length/3);

CodePudding user response:

I believe your goal is as follows.

  • You want to reduce the process cost of your script.

Modification points:

  • When I saw your script, it seems that getFileURL and listFolders are used in a loop. And from it can do around 200 rows simultaneously, the number of loops is about 200.
  • setValue is used in the loop. In this case, the process cost will become high. Ref

From the above situation, I thought that the process cost of your script will become high. In order to reduce the process cost, how about the following flow?

  1. Retrieve all folders with the folder name of round1, and create an object for searching the parent folder.
  2. Retrieve all folders just under the folder of folderID, and create an object for searching the folder name.
  3. Check each folder using the value of folderName in the loop. And return the value as an array.
  4. Put the array on the Spreadsheet.

In this modification, both retrieving the folder list and putting value to Spreadsheet are run outside of the loop. And, the folder list is retrieved by Drive API. By this, I thought that the process cost might be able to be reduced.

When this flow is reflected in your script, it becomes as follows.

Modified script:

Please remove your script in your question as follows. And, please enable Drive API at Advanced Google services. And please confirm the variables of s, colState, colEmail, iterador, colURL, FLAG, data, again.

// 1. Retrieve all folders with the folder name of `round1`, and create an object for searching the parent folder.
var obj1 = Drive.Files.list({ q: `title='round1' and mimeType='${MimeType.FOLDER}' and trashed=false`, fields: "items(id,parents(id))", maxResults: 1000 }).items.reduce((o, { id, parents }) => (o[parents[0].id] = id, o), {});

// 2. Retrieve all folders just under the folder of `folderID`, and create an object for searching the folder name.
var obj2 = Drive.Files.list({ q: `'${folderID}' in parents and mimeType='${MimeType.FOLDER}' and trashed=false`, fields: "items(id,title)", maxResults: 1000 }).items.reduce((o, { id, title }) => (o[title.split('(')[1].split(')')[0]] = { id, title, subFolder: obj1[id] || "" }, o), {});

// 3. Check each folder using the value of `folderName` in the loop. And return the value as an array.
var [, ...values] = s.getDataRange().getValues();
for (var i = iterador; i < data.length; i  ) {
  var row = data[i];
  var generado = row[colState];
  var folderName = row[colEmail];
  if (generado != "Subido") {
    let check = obj2[folderName];
    if (check) {
      if (check.subFolder) {
        values[i][colURL] = "https://drive.google.com/drive/folders/"   check.subFolder;
        values[i][colState] = FLAG;
        console.log(check.title);
      } else {
        values[i][colState] = "No ha subido archivo por ahora";
        console.log(`no hay round 1 ${check.title}`);
      }
    }
  }
}

// 4. Put the array to the Spreadsheet.
s.getRange(2, 1, values.length, values[0].length).setValues(values);
  • In this modified script, your getFileURL and listFolders are not used.

Note:

  • From I have around 900 folders, in this case when maxResults=1000 is used, I thought that pageToken might not be required to be used. When the number of your folders is over 1000, please use pageToken.

  • Unfortunately, I cannot know your variables of s, colState, colEmail, iterador, colURL, FLAG, data and your Spreadsheet. So when the above modified script cannot be used, can you provide a sample Spreadsheet and the sample values of s, colState, colEmail, iterador, colURL, FLAG, data? By this, I would like to confirm and modify the script.

References:

Added:

About the following additional question,

Ok, so this worked fine in a way. It is super fast BUT it is ignoring and not finding some folders that actually exist. I added pageToken as you recommended but maybe I did it wrong? obj2 is finding the folder with folderName correctly but obj1 is not finding the round1 folder which exists inside the folder that obj2 found :( any ideas?

When you want to use the value of pageToken, the script is as follows.

Modified script:

Please modify the above script as follows.

From:

// 1. Retrieve all folders with the folder name of `round1`, and create an object for searching the parent folder.
var obj1 = Drive.Files.list({ q: `title='round1' and mimeType='${MimeType.FOLDER}' and trashed=false`, fields: "items(id,parents(id))", maxResults: 1000 }).items.reduce((o, { id, parents }) => (o[parents[0].id] = id, o), {});

// 2. Retrieve all folders just under the folder of `folderID`, and create an object for searching the folder name.
var obj2 = Drive.Files.list({ q: `'${folderID}' in parents and mimeType='${MimeType.FOLDER}' and trashed=false`, fields: "items(id,title)", maxResults: 1000 }).items.reduce((o, { id, title }) => (o[title.split('(')[1].split(')')[0]] = { id, title, subFolder: obj1[id] || "" }, o), {});

To:

// 1. Retrieve all folders with the folder name of `round1`, and create an object for searching the parent folder.
var ar1 = [];
var pageToken = "";
do {
  var res = Drive.Files.list({ q: `title='round1' and mimeType='${MimeType.FOLDER}' and trashed=false`, fields: "items(id,parents(id)),nextPageToken", maxResults: 1000, pageToken });
  ar1 = [...ar1, ...res.items];
  pageToken = res.nextPageToken;
} while (pageToken);
var obj1 = ar1.reduce((o, { id, parents }) => (o[parents[0].id] = id, o), {});

// 2. Retrieve all folders just under the folder of `folderID`, and create an object for searching the folder name.
var ar2 = [];
pageToken = "";
do {
  var res = Drive.Files.list({ q: `'${folderID}' in parents and mimeType='${MimeType.FOLDER}' and trashed=false`, fields: "items(id,title),nextPageToken", maxResults: 1000, pageToken });
  ar2 = [...ar2, ...res.items];
  pageToken = res.nextPageToken;
} while (pageToken);
var obj2 = ar2.reduce((o, { id, title }) => (o[title.split('(')[1].split(')')[0]] = { id, title, subFolder: obj1[id] || "" }, o), {});
  • Related