Home > database >  Count Duplicate Lines from File using node.js
Count Duplicate Lines from File using node.js

Time:12-29

I have to read a large .csv file line by line, then take first column from a file which are countries and count duplicates. for example if file contains:

USA
UK
USA

output should be :

USA - 2
UK -1

code:

const fs = require('fs')
const readline = require('readline')

const file = readline.createInterface({
    input: fs.createReadStream('file.csv'),
    output: process.stdout,
    terminal: false
})

file.on('line', line => {
    const country = line.split(",", 1)
    const number = ??? // don't know how to check duplicates
    const result = country   number

    if(lineCount >= 1 && country != `""`) {
        console.log(result)
    }
    lineCount  
})

CodePudding user response:

So for starters, Array.prototype.split returns an array, you seem to want the first value from the array when you split it since you limit it to one. You can read about it here: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/split

Next you can create a map of all of the countries, and store the amount of times they were seen, and then log the results when the file has finished being read


const countries = {}
let lineCount = 0
file.on('line', line => {
    // Destructure the array and grab the first value
    const [country] = line.split(",", 1)
    // Calling trim on the country should remove outer white space
    if (lineCount >= 1 && country.trim() !== "") {
        // If the country is not in the map, then store it
        if (!countries[country]) {
            countries[country] = 1
        } else {
            countries[country]  
        }
    }
    lineCount  
})

// Add another event listener for when the file has finished being read
// You may access the country data here, since this callback function
// won't be called till the file has been read
// https://nodejs.org/api/readline.html#event-close
file.on('close', () => {
    for (const country in countries) {
        console.log(`${country} - ${countries[country]}`)
    }
})
  • Related