I have a server side logs folder that contains many hundreds of logs most of which are in subdirectories according to the machine the logs have come from. The task is to extract the name of the latest file in each directory containing a particular string (not all files have this string) so that analysis can be done per machine. I have included my attempt below but it seems rather clunky and long-winded and I wonder if there is an easier/better/faster/more efficient way of doing this maybe with linq?
void Main()
{
string SourcePath = @"L:\machinelogs";
string filemask = "*.log";
string searchitem = @"cannot access server data";
List<string> fileswithsearchitem = new List<string>();
DirectoryInfo directory = new DirectoryInfo(SourcePath);
IEnumerable<DirectoryInfo> dirs = directory.EnumerateDirectories("*",new EnumerationOptions() { RecurseSubdirectories = true, IgnoreInaccessible = true });
dirs.Append(directory);
foreach (var dir in dirs)
{
var found = false;
var files = dir.EnumerateFiles(filemask);
foreach(var file in files.OrderByDescending(f => f.CreationTime).ToList())
{
foreach (var line in File.ReadLines(file.FullName))
{
if(line.Contains(searchitem))
{
fileswithsearchitem.Add(file.FullName " : " line);
found = true;
break;
}
}
if(found)
{
break;
}
}
}
foreach (string item in fileswithsearchitem)
{
Console.WriteLine(item);
}
}
CodePudding user response:
I wonder if there is an easier/better/faster/more efficient way of doing this
I second the suggestion to post your question on https://codereview.stackexchange.com/. I'm not being snarky or hostile. Asking for "an easier/better/faster/more efficient way" is asking for code review. You'll get better answers over there. With that out of the way...
...maybe with linq?
I have never seen Linq make anything execute faster. In fact the only times I've noticed a performance difference it was for the worse. On the other hand it is nice for making code more expressive. So I view Linq as a tradeoff. In this case yes it might be worth it to you.
The task is to extract the name of the latest file in each directory containing a particular string (not all files have this string) so that analysis can be done per machine.
I have included my attempt below but it seems rather clunky and long-winded
The code you've written isn't reusable; instead it requires:
- A drive letter called "L" (not common)
- A drive letter and paths separated with
\
(which only happens on Windows) - Log file names to have the extension ".log"
- The search text to be "cannot access server data"
- Text to appear in STDOUT
- A real file system
BUT are these really issues to you? Is it worth the time and effort to make abstractions? If what you have is working then why fix it? I can't answer these questions for you, but instead you'll have to do some soul searching.
Abstractions
Here are some possible abstractions, and some possible ways to use them.
An abstract file system
Having an abstract file system makes it easier to write automated tests so that you can be sure your code will continue working through changes over the coming years.
These are the methods I see you using:
- EnumerateDirectories
- EnumerateFiles
With sufficient little hand-waving, your code could look like this:
interface IDirectory
{
/// <summary>
/// Recursively yields all accessible nested directories
/// </summary>
IEnumerable<IDirectory> EnumerateDirectories();
/// <summary>
/// Yields all file paths that match the given mask. Yields them in order of
/// newest first.
/// </summary>
IEnumerable<string> EnumerateFiles(string mask);
}
interface IFileSystem
{
IDirectory OpenDirectory(string path);
Stream OpenFile(string path);
}
class DirectoryInfoAdapter : IDirectory
{
readonly DirectoryInfo _info;
public IEnumerable<IDirectory> EnumerateDirectories() => _info
.EnumerateDirectories("*", new EnumerationOptions() { RecurseSubdirectories = true, IgnoreInaccessible = true })
.Select(x => new DirectoryInfoAdapter(x));
public IEnumerable<string> EnumerateFiles(string mask) => _info
.EnumerateFiles(mask)
.Select(x => x.FullName);
}
class RealFileSystem : IFileSystem
{
public IDirectory OpenDirectory(string path) => new DirectoryInfoAdapter(new DirectoryInfo(path));
public Stream OpenFile(string path) => File.Open(path);
}
void DoStuff(IFileSystem fileSystem)
{
string SourcePath = @"L:\machinelogs";
string filemask = "*.log";
string searchitem = @"cannot access server data";
List<string> fileswithsearchitem = new List<string>();
IDirectory directory = fileSystem.OpenDirectory(SourcePath);
IEnumerable<IDirectory> dirs = directory.EnumerateDirectories();
dirs.Append(directory);
foreach (var dir in dirs)
{
var found = false;
var files = dir.EnumerateFiles(filemask);
foreach(var file in files)
{
using var stream = fileSystem.OpenFile(file);
using var reader = new StreamReader(stream);
while (reader.ReadLine() is {} line)
{
if(line.Contains(searchitem))
{
fileswithsearchitem.Add(file " : " line);
found = true;
break;
}
}
if(found)
{
break;
}
}
}
foreach (string item in fileswithsearchitem)
{
Console.WriteLine(item);
}
}
void Main()
{
IFileSystem fileSystem = new RealFileSystem();
DoStuff(fileSystem);
}
Then you could write an automated test like this:
class DictionaryBackedDirectory : IDirectory
{
readonly IReadOnlyCollection<IDirectory> _directories;
readonly IReadOnlyCollection<string> _files;
public DictionaryBackedDirectory(
IReadOnlyCollection<IDirectory> directories,
IReadOnlyCollection<string> files)
{
_directories = directories;
_files = files;
}
public IEnumerable<IDirectory> EnumerateDirectories() => _directories;
public IEnumerable<string> EnumerateFiles(string mask) => _files; // TODO: implement masking
}
class DictionaryBackedFileSystem : IFileSystem
{
readonly IReadOnlyDictionary<string, IDirectory> _directories;
readonly IReadOnlyDictionary<string, Func<Stream>> _files;
public DictionaryBackedFileSystem(
IReadOnlyDictionary<string, IDirectory> directories,
IReadOnlyDictionary<string, Func<Stream>> files)
{
_directories = directories;
_files = files
}
public IDirectory OpenDirectory(string path) => _directories[path];
public Stream OpenFile(string path) => _files[path]();
}
void AutomatedTest()
{
var mockFileSystem = new DictionaryBackedFileSystem(
new Dictionary<string, IDirectory>()
{
[@"L:\machinelogs"] = new DictionaryBackedDirectory(
new Dictionary<string, IDirectory>(),
new string[]
{
@"L:\machinelogs\log1.log"
}
)
},
new Dictionary<string, Func<Stream>>()
{
[@"L:\machinelogs\log1.log"] = () => new MemoryStream() // TODO: populate the memory stream with data for the test
}
)
DoStuff(mockFileSystem);
}
Advantages of doing this:
- Increase reusability
- You could implement a remote file system if you wanted
- Make your code more testable
- There are a lot of advantages to having "unit-testable code", and having abstractions that can be "mocked" gets you closer to that golden city
Output results more abstractly
Your code doesn't have to be tied to Console.WriteLine()
or to a particular output encoding.
For example:
readonly struct Result
{
public readonly string Path;
public readonly string Line;
public Result(string path, string line)
{
Path = path;
Line = line;
}
}
IEnumerable<Result> DoStuff(IFileSystem fileSystem)
{
string SourcePath = @"L:\machinelogs";
string filemask = "*.log";
string searchitem = @"cannot access server data";
IDirectory directory = fileSystem.OpenDirectory(SourcePath);
IEnumerable<IDirectory> dirs = directory.EnumerateDirectories();
dirs.Append(directory);
foreach (var dir in dirs)
{
var files = dir.EnumerateFiles(filemask);
foreach(var file in files)
{
using var stream = fileSystem.OpenFile(file);
using var reader = new StreamReader(stream);
while (reader.ReadLine() is {} line)
{
if(line.Contains(searchitem))
{
yield return new Result(file, line)
}
}
}
}
}
void Main()
{
IFileSystem fileSystem = new RealFileSystem();
foreach (var result in DoStuff(fileSystem))
{
Console.WriteLine(result.File " : " result.Line);
break; // Could easily change this to continue searching
}
}
See how this moves the console interaction out of your code, makes the output format someone else's problem, and also lets the consumer of your code decide if they want to continue searching once you have a search hit?
This will also get your code one step closer to be unit-testable. Feel free to ask if it's not clear why.
Inject parameters
The source path, file mask, and search item don't have to be hardcoded constants.
For example:
IEnumerable<Result> DoStuff(
IFileSystem fileSystem,
string sourcePath,
string fileMask,
string searchItem)
{
IDirectory directory = fileSystem.OpenDirectory(sourcePath);
IEnumerable<IDirectory> dirs = directory.EnumerateDirectories();
dirs.Append(directory);
foreach (var dir in dirs)
{
var files = dir.EnumerateFiles(fileMask);
foreach(var file in files)
{
using var stream = fileSystem.OpenFile(file);
using var reader = new StreamReader(stream);
while (reader.ReadLine() is {} line)
{
if(line.Contains(searchItem))
{
yield return new Result(file, line)
}
}
}
}
}
void Main()
{
IFileSystem fileSystem = new RealFileSystem();
foreach (var result in DoStuff(
fileSystem,
@"L:\machinelogs",
"*.log",
@"cannot access server data"
))
{
Console.WriteLine(result.File " : " result.Line);
break;
}
}
See how this makes it possible to search for other things?
Use Path.Combine
This will remove one dependency on Windows--namely the backslash path separator.
void Main()
{
IFileSystem fileSystem = new RealFileSystem();
foreach (var result in DoStuff(
fileSystem,
Path.Combine("L:", "machinelogs"),
"*.log",
@"cannot access server data"
))
{
Console.WriteLine(result.File " : " result.Line);
break;
}
}
I wouldn't be surprised if none of my code above compiles. This was written off the cuff