Home > Blockchain >  Reading all lines in a file and splitting on multiple strings c#
Reading all lines in a file and splitting on multiple strings c#

Time:10-31

I am attempting to read all files in a directory and write text to an external file depending on a specific string in the files contained in the directory.

foreach (string line in File.ReadAllLines(pendingFile).Where(line => line.Split(';').Last().Contains("Test1")))
                        {
                            File.AppendAllText(path, line   Environment.NewLine);
                        }

How do I specify multiple strings here? like so "Test1", "Test2", "Test3"?

foreach (string line in File.ReadAllLines(pendingFile).Where(line => line.Split(';').Last().Contains("Test1", "Test2", "Test3")))

CodePudding user response:

You "do it the other way round"; you don't ask "does this last bit of the line contain any of these strings", you ask "are any of these strings contained in the last bit of the line"

var interestrings = new []{"Test1", "Test2", "Test3"};

File.ReadAllLines(pendingFile)
    .Where(line => 
        interestrings.Any(interestring => 
            line.Split(';').Last().Contains(interestring)
        )
    )

It's probably worth pointing out your code would be a lot more readable if you didn't try and do it all in the for header:

var interestrings = new []{"Test1", "Test2", "Test3"};
foreach (string line in File.ReadAllLines(pendingFile))
{

    var lastOne = line.Split(';').Last();
    if(!interestrings.Any(interestring => lastOne.Contains(interestring))
        continue;

    File.AppendAllText(path, line   Environment.NewLine);
}

It won't perform significantly differently, because LINQ will (behind the scenes) be enumerating all the lines, but skipping those where the condition doesn't match and only giving you those that does - this loop essentially does the same thing without the chained enumeration

You could get some useful performance boost by not using Split (use a substring from the last index of ';') and also consider collecting your strings into a stringbuilder rather than repeatedly appending them to a file. Also if you use File.ReadLines rather than ReadAllLines, you'll incrementally read the file rather than buffering it all into memory:

var sb = new StringBuilder(10000); //

var interestrings = new []{"Test1", "Test2", "Test3"};
foreach (string line in File.ReadLines(pendingFile))
{
    var lastOne = line;

    var idx = line.LastIndexOf(';');
    if(idx == -1)
        lastOne = line.Substring(idx);

    if(!interestrings.Any(interestring => lastOne.Contains(interestring))
        continue;

    sb.AppendLine(line);
}

File.AppendAllText(path, sb.ToString());

If the file is huge, consider opening a stream and writing it line by line too, rather than buffering much of it into a stringbuilder

CodePudding user response:

use regular expression instead:

.Where(line => Regex.IsMatch(line, @"Test\d $")) 

(haven't tested this exact piece of code, just giving an idea)

  • Related