I have this csv
Id | Name | Age |
---|---|---|
1 | Alex | 20 |
1 | Maria | 16 |
i want to make a CSV Reader that splits csv's in the elements header
and data
.
Header
is just a string array that saves the headers and works well.
i have issues when it comes to saving the data
. I want to save the csv in a list of lists without the headers and this is how i am doing it:
var predata = file
.Skip(1)
.ToList();
List<List<string>> data = new List<List<string>>();
for (int i = 0; i < predata.Count; i )
{
List<string> templist = predata[i]
.Split(';')
.ToList();
data.Add(templist);
}
this looks very inefficient and i would like to know if there is any way to do all this, but much shorter. Maybe even in one single linq query.
pls dont report this question, im giving my best to explain my problem
CodePudding user response:
A big thing we can do to improve performance and shorten the code is avoid calling .ToList()
more often than needed. In fact, if you can accept IEnumerable<string[]>
instead of List<List<string>>
we can get it down to this, which should also run faster and allocate MUCH less memory:
var data = file.Skip(1).Select(line => line.Split(';'));
If you really must have List<List<string>>
we can adjust it to the following:
var data = file.Skip(1).Select(line => line.Split(';').ToList()).ToList();
But again: every call to .ToList()
adds more RAM and CPU use to your program. Best to wait to do that for as long as possible.
I'm also curious about where the file
variable came from. It seems likely it's the result of either File.ReadAllLines()
or File.ReadLines()
, and I can tell you the latter will again be MUCH more efficient for this than the former.
So you want something like this:
var header = File.ReadLines("...").Take(1);
var data = File.ReadLines("...").Skip(1).Select(line => line.Split(';'));
Note at this point data
has not yet read through the file. However, you can use it in a foreach
loop or with a linq extension, and it will read through file in a just-in-time way, such that only one line from the file is ever needed in memory at a time.
This is more efficient, even when you will eventually show the entire file contents on the screen or otherwise fully load the file, because it allows you to save RAM (and CPU) while translating the raw data from the file into the final structure you want for display or other purposes.
Separate from all of this, something you can do to really improve performance is get a dedicated csv parser from NuGet, especially as .Split()
is known to be a little slower as well as fail in numerous edge cases.
CodePudding user response:
Sure - just use .Select
:
var data = predata.Select(p = > p.Split(';'));
This will actually give you an IEnumerable<string[]>
which you can iterate over. If you need lists, just add ToList
at each level:
var data = predata.Select(p = > p.Split(';').ToList()).ToList();
And you can skip predata
by just changing it to file.Skip(1)
(there's no need to call ToList
after the Skip
if all you're doing is iterating).
CodePudding user response:
Try this:
var data=file.select(x=>x.Split(';').ToList()).ToList();