Home > Software design >  Newbie - need script to remove duplicates based on multiple criteria
Newbie - need script to remove duplicates based on multiple criteria

Time:05-07

So I'm super new to Google Sheets - as I mostly use Excel and Smartsheet. I was trying to accomplish this by just recording a macro, so this is VERY messy.

I have a doc that we're uploading data into on a regular basis, and need to be able to remove duplicated items.

The items that have already been sorted will have a category applied, so my thought was that we could remove items duplicated in column 'H' (opp id) and then with a blank cell in Column 'B' (Category)

The way I was doing it at first was to record the macro with me manually doing the conditional formatting to highlight duplicates on column H, and then pulled a script from here to delete rows that were blank in column B. However, the script removes ALL items that are blank in column B and not just the items I had filtered based on the conditional formatting.

I know this is because the script is pulling the whole sheet, but I just can't figure out how to fix it.

    function DeleteDuplicates() {
 var spreadsheet = SpreadsheetApp.getActive();
  spreadsheet.getRange('J9').activate();
  spreadsheet.setActiveSheet(spreadsheet.getSheetByName('Report'), true);
  spreadsheet.getRange('H:H').activate();
  var conditionalFormatRules = spreadsheet.getActiveSheet().getConditionalFormatRules();
  conditionalFormatRules.splice(0, 1, SpreadsheetApp.newConditionalFormatRule()
  .setRanges([spreadsheet.getRange('H1:H981')])
  .whenFormulaSatisfied('=countif(H:H,H1)>1')
  .setBackground('#FF00FF')
  .setFontColor('#000000')
  .build());
  spreadsheet.getActiveSheet().setConditionalFormatRules(conditionalFormatRules);
  spreadsheet.getRange('J15').activate();
    var spreadsheet = SpreadsheetApp.getActive();
  spreadsheet.getRange('D15').activate();
  spreadsheet.setActiveSheet(spreadsheet.getSheetByName('Report'), true);
  spreadsheet.getRange('A2').activate();
  spreadsheet.getActiveSheet().getFilter().remove();
  spreadsheet.getRange('A2:AC112').activate();
  spreadsheet.getRange('A2:AC112').createFilter();
  spreadsheet.getRange('H2').activate();
  var criteria = SpreadsheetApp.newFilterCriteria()
  .build();
  spreadsheet.getActiveSheet().getFilter().setColumnFilterCriteria(8, criteria);
    var sheet = SpreadsheetApp.getActiveSheet();
  var rows = sheet.getDataRange();
  var numRows = rows.getNumRows();
  var values = rows.getValues();

  var rowsDeleted = 0;
  for (var i = 0; i <= numRows - 1; i  ) {
    var row = values[i];
    if (row[1] == '') {
      sheet.deleteRow((parseInt(i) 1) - rowsDeleted);
      rowsDeleted  ;}
  spreadsheet.getActiveSheet().getFilter().removeColumnFilterCriteria(8);
  spreadsheet.getRange('J13').activate();
      }
  }
};

CodePudding user response:

so my thought was that we could remove items duplicated in column 'H' (opp id) and then with a blank cell in Column 'B' (Category)

Essentially, you are looking to remove all rows with duplicate Column H values AND with blank cells in Column B?

If that's the case,

Try:

function myFunction() {

  const sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName(`Report`)
  const data = sheet.getDataRange().getValues()

  const uniqueItems = [... new Set(data.map(i => i[7]))]

  const updatedData = []
  for (let item of uniqueItems) {
    const itemData = data.filter(i => i[7] === item)
    if (itemData.length > 1) {
      itemData.filter(i => i[1] !== ``)
              .forEach(i => updatedData.push(i))
    } else { updatedData.push(...itemData) }
  }

  sheet.getRange(1, 1, updatedData.length, updatedData[0].length).setValues(updatedData)

}

Commented:

function myFunction() {

  const sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName(`Report`)
  const data = sheet.getDataRange().getValues()

  // Get all unique items in the sheet.
  const uniqueItems = [... new Set(data.map(i => i[7]))]

  const updatedData = []
  // For each unique item..
  for (let item of uniqueItems) {
    // Get all rows containing the item.
    const itemData = data.filter(i => i[7] === item)
    // If there's more than one row..
    if (itemData.length > 1) {
      // Remove all rows with blank cells in Column B...
      itemData.filter(i => i[1] !== ``)
              // ... and add to our updatedData.
              .forEach(i => updatedData.push(i))
    // If there's only one row for this item, add to updatedData
    } else { updatedData.push(...itemData) }
  }

  sheet.getRange(1, 1, updatedData.length, updatedData[0].length).setValues(updatedData)

}

CodePudding user response:

Description

Based on your original script it seems your programming experience is limited. Although NEWAZA's methods may work I propose a simpilier script that may be easier for you to follow. Much of original script is from moving the cursor around while recording a macro and is redundant. Instead I propose a simple macro that should be easy for you to follow.

I'm assuming new data is added and so I first perform a sort of column H and B. Then I loop through backwards to find duplicates and blank cells and delete rows that match the criteria. I loop backwards so I don't have to keep track of row number as rows are deleted. I'm also assuming rows to be deleted are not contiguous which may result in a slight performance hit.

Script

function deleteDuplicates() {
  try {
    var spreadsheet = SpreadsheetApp.getActive();
    let sheet = spreadsheet.getSheetByName('Report');
    let range = sheet.getDataRange();
    range.sort([{column: 8, ascending: true},{column: 2, ascending: false}]);
    let values = range.getValues();
    // assuming the conditional format rule has already been applied to this sheet
    for( let i=values.length; i>2; i-- ) {
      let j = i-1;
      let k = j-1;
      if( ( values[j][7] === values[k][7] ) && ( values[j][1] === "" ) ) {
        sheet.deleteRow(i);
      }
    }
  }
  catch(err) {
    console.log(err);
  }
}

Reference

  • Related