Home > Software design >  c# csv file with unwanted CRLFs
c# csv file with unwanted CRLFs

Time:08-13

Ok so I wasn't sure how to ask this but I would love answers I've been stumped for hours. Let's say I have a CSV file and I want to get all data at index position 1 (The Company Name in the sample image) and compare them too each other.

I am currently using this line of code to read in the CSV file line by line,

string[] csvData = System.IO.File.ReadAllLines(@"C:\Path");

Then I would split them by rows and try to run a code to grab the wanted data like this

var comNames = new List<string>();

for (int i = 0; i < csvData.Length; i  ){  
    string[] rows = csvData[i].Split(',');  
    comNames.Add(rows[1]);
}

But as you all know that won't work for lines 4 and 5 even though it is still the same column. Is there a way for me to delete the CRLF's that are causing this issue so I can make this code work or is there another workaround?

Sample CSV Image

Image in text format: Serial Number,Company Name,Employee Markme,Description,Leave
9788189999599,TALES OF SHIVA,Mark,mark,0
9780099578079,1Q84
THE
COMPLETE
TRILOGY,HARUKI MURAKAMI,Mark,0
9780198082897,MY KUMAN,Mark,Mark,0

CodePudding user response:

The code below will work if the following assumptions hold true:

  1. There is always a serial #
  2. There is always a company name
  3. There is always a comma before and after the company name
  4. The serial # is always exactly 13 digits

#1-3 are required for this solution. You can tweak the RegEx pattern to deal with #4.

public List<string> GetListOfCompanies() {
     string data = File.ReadAllText(@"C:\Users\adam\Documents\test.csv");
     var companies = new List<string>();
     var pattern = @"\d{13}";

     //replace the line ending with something unique
     data = data.Replace(System.Environment.NewLine, "#thisisreallyunique#");

     //find each serial number, and grab the item after it
     foreach (Match match in Regex.Matches(data, pattern)) {
        var temp = data.Substring(match.Index); //cut off everything before this match
        var temp2 = temp.Substring(temp.IndexOf(",")   1); //cut off the serial # and the comma following it
        //at this point we have the company name, plus everything after it
        var company = temp2.Substring(0, temp2.IndexOf(",")); //cut off everything after it
        //oh, and put the spaces back into the company
        company = company.Replace("#thisisreallyunique#", " ");

        companies.Add(company);
     }
     return companies;
  }
  • Related