Really bad performance for large Lists of strings and .txt files-CodePudding

What you need to know:

My application uses a database for foods, which exists in a .txt file. Each food has about 170 datas (2-3 digit numbers) which are separated by tabstops, and each food is again separated by \n, so each line in this .txt file has the datas for 1 food.

The applications target platform is Android, it needs to work offline and I use Unity with c# for coding.

My 2 Problems are:

Getting access to the .txt file

As it is not possible for android applications to access a .txt file by

$"{Application.DataPath}/textFileName.txt"

I assigned the .txt file as a TextAsset (name: txtFile) in the Inspector. When the app gets started for the first time I load all the data of the TextAsset file into a json (name: jsonStringList), which contains a List of strings:

for (int i = 0; i < amountOfLinesInTextFile; i ); { jsonStringList.Add(txtFile.text.Split('\n')[i]) }

Technically that does work, but unfortunately the txtFile has a total of about 15000 lines, which makes it really slow (Stopwatch time for the for-loop: ≈750000 ms, which is about 12.5 minutes...)

Obviously it is not an option to let the user wait for that long when opening the app for the first time...

Searching in that jsonList

In that app it is possible to make an own food by putting multiple foods together. To do that the user has to search for a food and can then press the result to add it.
Currently I check in a for-loop if the input of the user-searchbar InputField (name: searchbar) matches a food of the jsonStringList and if that food is not already displayed.
If both is true, I add the name of the food to a List<string> (name: results), which is what I use to display the matching foods. (As the datas (including the name) of the foods are separated by tabstops I use .Split('\t') to get the correct data for the name of the food)
```
  for (int i = 0; i < amountOfLinesInTextFile; i  )
  {   string name = jsonStringList[i].Split('\t')[nameIndex].ToLower();
      if (name.Equals(searchBar.text.ToLower()) && !results.Contains(name))
      {
          results.Add(name);
      }
  }
```

Again: That technically works, but it is also too slow (even tough it's much faster then problem 1)

(Stopwatch for the for-loop: ≈1600 ms)

I'd be very happy for any help to improve the time of those two actions! Maybe there is a whole different approach for handling such large .txt files, but every bit of decreasing the time would be helpful!

CodePudding user response：

15000 is not a big file, really. You just do too many unnecessary reading/transformations. You need to do it once, cache it (save in variable in your case), reuse it.

var foodIndex = txtFile
  .text
  .Split('\n')                //get rows
  .Select(x=> x.Split('\t'))  //get columns for each row
  .ToDictionary(x=> x[nameIndex], StringComparer.OrdinalIgnoreCase);   //build case-insensitive search index

var myFood = foodIndex["aPpLe"];

This produce Dictionary<string, string[]>

Better approach

Deserialize CSV format (your file is obviously CSV table) into POCO row:

public class Food
{
   [DataMember(Order=1)] //here is your nameIndex
   public string Name {get;set;}
   [DataMember(Order=2)]
   public int Amount {get;set;}
   //...
}

var foodIndex = SomeCSVParse<Food>(txtFile.text)
  .ToDictionary(x=> x.Name, StringComparer.OrdinalIgnoreCase);

var myFood = foodIndex["aPpLe"];

This produce Dictionary<string, Food> search index, which look better, easier to use.

This way all conversion from string to int/double/datetime/etc, order of columns, separators (comma, tab, whitespace), cultures (in case there is float/double), efficient reading, headers, etc can be just ditched to 3rd party framework. Someone did this here - Parsing CSV files in C#, with header

There is also plethora of frameworks on nuget, just pick whatever is smaller/popular or copypaste from sources - https://www.nuget.org/packages?q=CSV

And read more about data structures in C# - https://docs.microsoft.com/en-us/dotnet/standard/collections/