Home > OS >  how can i write a file into an array based on the first part of the name and sort it by that?
how can i write a file into an array based on the first part of the name and sort it by that?

Time:08-24

The files are named as follows: 5000023_abc_2000045.pdf, 5000023_def_2000045.pdf.

All files are in the same directory.

I want to write all these files sorted into an array and then put them to a 3rd party program to merge them. It is about 60000 files.

I tried with getfiles, but that didn't work. sorry, i'm a total beginner.

many thanks in advance

Heres my Code

using System.Diagnostics;
using System.IO;
using bin_vMergePdfNeu.Properties;

namespace System.Configuration
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.Title = "bin_vMergePDF";

            // Ordner aus dem die PDF gezogen werden
            string altDirIn = Settings.Default.AlternativeInputDir;
            // Ordner in dem die zusammengeführte PDF abgelegt werden soll
            string altDirOut = Settings.Default.AlternativeOutputDir;

         ////////////////////////////////////////////////////////////////////
        //var paths = new Collections.Generic.List<string>();
        //paths.Add(altDirIn);
        //var cmd = String.Join(" ", altDirIn)   " cat output "   altDirOut;
        ////////////////////////////////////////////////////////////////////

        string[] rchg = Directory.GetFiles(altDirIn, "*.pdf".Split('_')[0]);

        string cmd = string.Join(" ", rchg)   " cat output "   altDirOut;

        Process p = new Process();
        p.StartInfo.WorkingDirectory = Environment.CurrentDirectory;
        p.StartInfo.FileName = "pdftk.exe";
        p.StartInfo.Arguments = cmd;
        p.StartInfo.UseShellExecute = false;
        p.StartInfo.RedirectStandardOutput = true;
        p.StartInfo.RedirectStandardError = true;
        p.Start();

        Console.WriteLine(p.StandardError.ReadToEnd());

        Console.WriteLine();

        Console.ReadKey();

        p.WaitForExit();
        }
    }
}

CodePudding user response:

This would do the job:

var folder = new DirectoryInfo(altDirIn);
var files = folder.EnumerateFiles("*.pdf").OrderBy(fi => Convert.ToInt32(fi.Name.Split('_')[0]));
var cmd = $"{string.Join(" ", files.Select(fi => fi.FullName))} cat output {altDirOut}";

That assumes that every file starts with a number and an underscore.

It's also worth noting that, if you want to sort file names the same way File Explorer does, you can use the same Windows API that File Explorer uses. You can create a comparer that incorporates that API:

public class LogicalStringComparer : IComparer, IComparer<string>
{
    [DllImport("shlwapi.dll", CharSet = CharSet.Unicode)]
    private static extern int StrCmpLogicalW(string x, string y);
 
    int IComparer.Compare(object x, object y)
    {
        return Compare((string) x, (string) y);
    }
 
    public int Compare(string x, string y)
    {
        return StrCmpLogicalW(x, y);
    }
}

You can then use that class to sort your file paths:

var filePaths = Directory.EnumerateFiles(altDirIn, "*.pdf").OrderBy(s => s, new LogicalStringComparer());
var cmd = $"{string.Join(" ", filePaths)} cat output {altDirOut}";

It's also worth noting that that code, which is based on what you wrote yourself, is probably going to fail if the folder path or any file name has a space in it. You could quote each path to handle that:

var cmd = $"""{string.Join(""" """, filePaths)}"" cat output {altDirOut}";

CodePudding user response:

So, it seems that you want to query files within directory, and you are going to do

  1. Obtain (enumerate) all *.pdf files within directory
  2. Check if file name matches number_name pattern
  3. Obtain number part to be sorted by it
  4. Since number is in fact string of arbitrary length, we can't convert it to int; so we sort first by length and the for value
  5. Get rid of number part; we want to get file only
  6. Materialize results as an array

Code:

using System.IO;
using System.Linq;
using System.Text.RegularExpressions;

...

string[] rchg = Directory
  .EnumerateFiles(altDirIn, "*.pdf")
  .Select(file => Path.GetFileName(file))
  .Where(file => Regex.IsMatch(file, "^[0-9] _"))
  .Select(file => (file : file, number : file.Substring(0, file.IndexOf('_'))))
  .OrderBy(pair => pair.number.TrimStart('0').Length)
  .ThenBy(pair => pair.number)
  .Select(pair => pair.file)
  .ToArray();

Then be careful: if you have 60000 files, the array can well be huge and hardly you pass it via command line. You can try saving the results into a file and pass its name to the exe

  • Related