Home > Software design >  Remove all duplicate lines from return data HTML by regex
Remove all duplicate lines from return data HTML by regex

Time:04-14

I'm using regex in app script to scrape data from website:

I try this code:

const name = /(?<=<span >)(.*?)(?=<\/span>)/gi; // work Great

for(var i = 0; i < 9; i  ){

var names = data[i].match(name)[0];
Logger.log(names)
}

this code work fine but give me duplicate lines:

1:56:22 PM  Notice  Execution started
1:56:35 PM  Info    john
1:56:35 PM  Info    ara
1:56:35 PM  Info    john
1:56:35 PM  Info    anita
1:56:35 PM  Info    ara
1:56:35 PM  Info    fabian
1:56:35 PM  Info    ara
1:56:35 PM  Info    john
1:56:35 PM  Info    fabian
1:56:37 PM  Notice  Execution completed

I want to remove all duplicate names and see result like that:

1:56:22 PM  Notice  Execution started
1:56:35 PM  Info    john
1:56:35 PM  Info    ara
1:56:35 PM  Info    anita
1:56:35 PM  Info    fabian
1:56:37 PM  Notice  Execution completed

CodePudding user response:

Set

You can use a Set (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Set) in order to do that.

names = Array.from(new Set(names));

We don't have your final goal, here you simply console.log your data, but you may not need to convert your Set back to an Array :)

Sort

An other solution would be to sort your array, and then iterate on it in order to remove dupplicates with more ease.

array.sort();

array.filter((el, index) => index < array.length && el !== array[index   1]);

Test on my browser::

let a = [1,1,2,3,4,4,5,6,7,7];

a.filter((el, index) => index < a.length && el !== a[index   1]);

Array(7) [ 1, 2, 3, 4, 5, 6, 7 ];

This solution obviously does not preserve any order, while the forst one seems to preserve initial order, at least on my firefox's js

CodePudding user response:

Description

First I would collect all the names in an array. Then using the [...new Set()] create an array of unique names.

Script

function spanTest() {
  try {
    const name = /(?<=<span >)(.*?)(?=<\/span>)/gi; // work Great
    let data = ['<=<span >john</span>',
                '<=<span >ara</span>',
                '<=<span >john</span>',
                '<=<span >anita</span>',
                '<=<span >ara</span>',
                '<=<span >fabian</span>',
                '<=<span >ara</span>',
                '<=<span >john</span>',
                '<=<span >fabian</span>'];

    let names = [...new Set(data.map( span => span.match(name)[0]) )];
    console.log(names);
    
  }
  catch(err) {
    console.log(err);
  }
}

7:39:23 AM  Notice  Execution started
7:39:23 AM  Info    [ 'john', 'ara', 'anita', 'fabian' ]
7:39:23 AM  Notice  Execution completed

Reference

  • Related