Home > front end >  Is there a way to filter a CSV file for data validation without for loops. (Lumenworks CSVReader)
Is there a way to filter a CSV file for data validation without for loops. (Lumenworks CSVReader)

Time:04-26

I want to be able to filter out a CSV file and perform data validation on the filtered data. I imagine for loops, but the file has 2 million cells and it would take a long time. I am using Lumenworks CSVReader for accessing the file using C#.

I found this method csvfile.Where<> but I have no idea what to put in the parameters. Sorry I am still new to coding as well.

[EDIT] This is my code for loading the file. Thanks for all the help!

//Creating C# table from CSV data
var csvTable = new DataTable();
var csvReader = new CsvReader(newStreamReader(System.IO.File.OpenRead(filePath[0])), true);
csvTable.Load(csvReader);

//grabs header from the CSV data table
string[] headers = csvReader.GetFieldHeaders(); //this method gets the headers of the CSV file 
string filteredData[] = csvReader.Where // this is where I would want to implement the where method, or some sort of way to filter the data

//I can access the rows and columns with this
csvTable.Rows[0][0]
csvTable.Columns[0][0]

//After filtering (maybe even multiple filters) I want to add up all the filtered data (assuming they are integers)
var dataToValidate = 0;
foreach var data in filteredData{
dataToValidate  = data;
}
if (dataToValidate == 123)
//data is validated


CodePudding user response:

I would read some of the documentation for the package you are using:

https://github.com/phatcher/CsvReader

https://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader

To specifically answer the filtering question, so it only contains the data you are searching for consider the following:

var filteredData = new List<List<string>>();    
using (CsvReader csv = new CsvReader(new StreamReader(System.IO.File.OpenRead(filePath[0])), true));
    {
        string searchTerm = "foo";
        while (csv.ReadNextRecord())
        {
            var row = new List<string>();
            for (int i = 0; i < csv.FieldCount; i  )
            {
                if (csv[i].Contains(searchTerm))
                {
                    row.Add(csv[i]);
                }
            }
            filteredData.Add(row);
        }
    }

This will give you a list of a list of string that you can enumerate over to do your validation

int dataToValidate = 0;
foreach (var row in filteredData)
{
    foreach (var data in row)
    {
        // do the thing
    }   

}

--- Old Answer ---

Without seeing the code you are using to load the file, it might be a bit difficult to give you a full answer, ~2 Million cells may be slow no matter what what. Your .Where comes from System.Linq https://docs.microsoft.com/en-us/dotnet/api/system.linq.enumerable.where?view=net-6.0

A simple example using .Where

//Read the file and return a list of strings that match the where clause
public List<string> ReadCSV()
{
    List<string> data = File.ReadLines(@"C:\Users\Public\Documents\test.csv");
           .Select(line => line.Split(','))
           // token[x] where x is the column number, assumes ID is column 0 
           .Select(tokens => new CsvFileStructure { Id = tokens[0], Value = tokens[1] })
           // Where filters based on whatever you are looking for in the CSV
           .Where(csvFileStructure => csvFileStructure.Id == "1")
           .ToList();

    return data;
}

// Map of your data structure
public class CsvFileStructure
{
    public long Id { get; set; }
    public string Name { get; set; }
    public string Value { get; set; }
}

Modified from this answer:

https://stackoverflow.com/a/10332737/7366061

CodePudding user response:

There is no csvreader.Where method. The "where" is part of Linq in C#. The link below shows an example of computing columns in a csv file using Linq:

https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/linq/how-to-compute-column-values-in-a-csv-text-file-linq

  • Related