Home > Enterprise >  Efficient way to Sort CSV raw string data
Efficient way to Sort CSV raw string data

Time:03-28

I have a raw csv data as mentioned below

  James,Mary,Patricia,Anthony,Donald\n
  145,10,100,39,101\n
  21,212,313,28,1

In above mentioned string, columns are comma , separated, first line is column and after each \n a new row where the data is for each person. What I am trying to achieve here is it should be sorted as mentioned below.

  Anthony,Donald,James,Mary,Patricia\n
  39,101, 145,10,100\n
  28,1,21,212,313

What I have tried so far is, splitting based on \n, further splitting based on comma , for each value, but in this case there will be no proper reference to sort value.

What Part I am Struggling with

        string data = "James,Mary,Patricia,Anthony,Donald\n145,10,100,39,101\n21,212,313,28,1";

        var rows = data.Split('\n');
        var unorderedNames = rows[0].Split(',');

Split Array based on \n

enter image description here

Split Names based on , comma -

enter image description here

Now, If I implement sorting, I believe I will loose all references because names will be sorted but below in row 2 and 3, mentioned payments will not.

In my code mentioned above, The first line split the array into three based on \n. Then when I soft first line, I beleive I don't have reference of other values in the same array.

I appreciate if you can assist me to find some efficient method to convert this raw data into sored alphabetically with values in efficient way.

CodePudding user response:

public class StackDemo
    {
        private string source = "James,Mary,Patricia,Anthony,Donald\n145,10,100,39,101\n21,212,313,28,1";

        public string ProcessString()
        {

            var rows = source.Split('\n');

            var row1Values = rows[0].Split(',');
            var row2Values = rows[1].Split(',');
            var row3Values = rows[2].Split(',');

            List<Person> people = new List<Person>();
            for (int index = 0; index < 5; index  )
            {
                people.Add(new Person()
                {
                    Name = row1Values[index],
                    SomeValue = row2Values[index],
                    OtherValue = row3Values[index]
                });
            }

            people.Sort((x, y) => x.Name.CompareTo(y.Name));

            List<string> names = new List<string>();
            List<string> someValues = new List<string>();
            List<string> otherValues = new List<string>();

            foreach (Person p in people)
            {
                names.Add(p.Name);
                someValues.Add(p.SomeValue);
                otherValues.Add(p.OtherValue);
            }


            string result = "";
            result = BuildString(names, result);
            result = BuildString(someValues, result);
            result = BuildString(otherValues, result);

            result = result.Remove(result.Length - 1, 1);

            return result;
        }

        private static string BuildString(List<string> names, string result)
        {
            foreach (string s in names)
            {
                result  = s   ",";
            }

            result = result.Remove(result.Length - 1, 1);
            result  = "\n";
            return result;
        }
    }

    public class Person
    {
        public string Name { get; set; }
        public string SomeValue { get; set; }
        public string OtherValue { get; set; }
    }

This code is extremely basic, (rude) but it does what I think you want?)

Also it returns the string in the same format as it was received.

EDIT: Expanded on comment question!

Added some unit tests to help validate how I understood your question:

public class UnitTest1
    {
        [Fact]
        public void TestWith5()
        {
            string input = "James,Mary,Patricia,Anthony,Donald\n145,10,100,39,101\n21,212,313,28,1";
            string expected = "Anthony,Donald,James,Mary,Patricia\n39,101,145,10,100\n28,1,21,212,313";

            // arrange
            StackDemo3 subject = new StackDemo3();

            // act
            string actualResult = subject.ProcessString(input);

            // assert
            Assert.Equal(expected, actualResult);
        }

        [Fact]
        public void TestWith4()
        {
            string input = "James,Mary,Patricia,Anthony,\n145,10,100,39,\n21,212,313,28,";
            string expected = ",Anthony,James,Mary,Patricia\n,39,145,10,100\n,28,21,212,313";

            // arrange
            StackDemo3 subject = new StackDemo3();

            // act
            string actualResult = subject.ProcessString(input);

            // assert
            Assert.Equal(expected, actualResult);
        }

        [Fact]
        public void TestWith3()
        {
            string input = "James,Mary,Patricia,,\n145,10,100,,\n21,212,313,,";
            string expected = ",,James,Mary,Patricia\n,,145,10,100\n,,21,212,313";

            // arrange
            StackDemo3 subject = new StackDemo3();

            // act
            string actualResult = subject.ProcessString(input);

            // assert
            Assert.Equal(expected, actualResult);
        }

        [Fact]
        public void TestWith2()
        {
            string input = ",,James,Mary,\n,,145,10,\n,,21,212,";
            string expected = ",,,James,Mary\n,,,145,10\n,,,21,212";

            // arrange
            StackDemo3 subject = new StackDemo3();

            // act
            string actualResult = subject.ProcessString(input);

            // assert
            Assert.Equal(expected, actualResult);
        }

        [Fact]
        public void TestWith1()
        {
            string input = "James,,,,\n145,,,,\n21,,,,";
            string expected = "James,,,,\n145,,,,\n21,,,,";

            // arrange
            StackDemo3 subject = new StackDemo3();

            // act
            string actualResult = subject.ProcessString(input);

            // assert
            Assert.Equal(expected, actualResult);
        }

        [Fact]
        public void TestWith0()
        {
            string input = ",,,,\n,,,,\n,,,,";
            string expected = ",,,,\n,,,,\n,,,,";

            // arrange
            StackDemo3 subject = new StackDemo3();

            // act
            string actualResult = subject.ProcessString(input);

            // assert
            Assert.Equal(expected, actualResult);
        }
    }

Here is the actual implementation:

public interface IStringPeopleParser
{
    List<Person> ConvertToPeople(string input);
}

public interface IPeopleStringParser
{
    string ConvertPeopleToString(List<Person> people);
}

public class PeopleStringParser : IPeopleStringParser
    {
        public string ConvertPeopleToString(List<Person> people)
        {
            List<string> names = new List<string>();
            List<string> someValues = new List<string>();
            List<string> otherValues = new List<string>();

            foreach (Person p in people)
            {
                names.Add(p.Name);
                someValues.Add(p.SomeValue);
                otherValues.Add(p.OtherValue);
            }

            string output = "";
            output  = string.Join(",", names);
            output  = "\n";
            output  = string.Join(",", someValues);
            output  = "\n";
            output  = string.Join(",", otherValues);

            return output;
        }
    }

public class StringPeopleParser : IStringPeopleParser
    {
        public List<Person> ConvertToPeople(string source)
        {
            var rows = source.Split('\n');

            string[] row1Values = rows[0].Split(',');
            string[] row2Values = rows[1].Split(',');
            string[] row3Values = rows[2].Split(',');

            List<Person> people = new List<Person>();
            for (int index = 0; index < row1Values.Length; index  )
            {
                people.Add(new Person()
                {
                    Name = row1Values[index],
                    SomeValue = row2Values[index],
                    OtherValue = row3Values[index]
                });
            }

            return people;
        }
    }

public class StackDemo3
    {
        IStringPeopleParser stringPeopleParser = new StringPeopleParser();
        IPeopleStringParser peopleStringParser = new PeopleStringParser();

        public string ProcessString(string s) {
            List<Person> people = stringPeopleParser.ConvertToPeople(s);
            int validCount = people.Where(x => x.IsValid()).Count();
            switch (validCount)
            {
                case 0:
                case 1:
                    {
                        return peopleStringParser.ConvertPeopleToString(people);
                    }
                case 2:
                case 3:
                case 4:
                case 5:
                    {
                        people = people.OrderBy(x => x.Name).ToList();
                        return peopleStringParser.ConvertPeopleToString(people);
                    }
                default:
                    {
                        return "";//outside bounds of reality. Should never happen.
                    }
            }
        }

    }

public class Person
    {
        public string Name { get; set; }
        public string SomeValue { get; set; }
        public string OtherValue { get; set; }

        public bool IsValid() {
            if (string.IsNullOrWhiteSpace(Name) || string.IsNullOrWhiteSpace(SomeValue) || string.IsNullOrWhiteSpace(OtherValue))
            {
                return false;
            }
            return true;
        }
    }

Also I don't really know why you don't want the person class? You need to have a reference between the 3 values possible in each row (the index value is the key) by creating the Person class, the class instance becomes said reference.

CodePudding user response:

I think the problem is you want to sort the headers of your CSV into some "whatever" order, and have the data "go with it"

Come up with some way to represent your data as a 2D array:

var lines = File.ReadAlLines("path");

var data = lines.Skip(1).Select(line => line.Split(',')).ToArray(); //nasty way of parsing a CSV but it's accessory to this discussion..

var head = lines[0]
             .Split(',')
             .Select((s,i) => new { Name = s, Index = i })
             .OrderBy(at => at.Name)
             .ToArray();

head is now the sorted headers, but it has an additional property that tells you what column in data holds that person's data. Anthony is first in heaD, but their Index is 3 so we should get Anthony's data from data[3]

foreach(var person in head){
  Console.WriteLine($"Now printing {person.name} data from column {person.Index}");

  foreach(var line in data){
    Console.Writeline(line[person.Index]);
  }

}

We didn't bother sorting the data (it's more efficient not to), we just stored what column it's in as part of the object that does get sorted, and then regardless of person sort order, we access the data via that column. Sorting head is very fast, because it's just a few names. It always maintains its map of "where is the data" because Index doesn't change regardless of the sort order of head

  • Related