Home > Blockchain >  How to remove all instances of line breaks and tabs using Javascript
How to remove all instances of line breaks and tabs using Javascript

Time:12-15

I am scraping a website and need to remove all the /n and /t from my strings.

I have tried the following code:

item.post_category = [];
 Array.from($doc.find('h6.link')).forEach(function(link){ 
            console.log(link.textContent.replace(/\t \n /gm, ""));        
            item.post_category.push(link.textContent);
          })
//this removes the linebreaks but not the tabs

Here are multiple sample array I have to iterate over:

["\n\t\t\t\t\tJune 15, 2021 • \n\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\tFamily,\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\tGender Equality,\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\tIn the News\n\t\t\t\t"]

["\n\t\t\t\t\tJune 13, 2020 • \n\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\tIn the News\n\t\t\t\t"]

["\n\t\t\t\t\tJuly 5, 2021 • \n\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\tNews\n\t\t\t\t"]

IDEALLY, I would want my arrays to look like this. Remove the date AND the \n and \t.

["Family,Gender Equality,In the News"]
["In the News"]
["News"]

CodePudding user response:

There are hundreds of ways to do it, you could use a regex, or split, depending on your need.

Here is one of the possible solutions :

let str = "\n\t\t\t\t\tJune 15, 2021 • \n\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\tFamily,\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\tGender Equality,\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\tIn the News\n\t\t\t\t"

// Remove all new lines and tabs with a regex. You could also add '\r\n' if necessary.
str = str.replace(/(\n|\t)/gm, '');

// Here we assume that your string will 
// always contain the date followed by this character: •. 
// So we split according to this character, and we select 
// the second item of the table, which corresponds to the text without the date.
let result = str.split('•')[1].trim()

console.log(result) // prints 'Family,Gender Equality,In the News'
  • Related