I have two var of code:
first:
struct pair_fiodat {string fio; string dat;}
List<pair_fiodat> list_fiodat = new List<pair_fiodat>();
// list filled 200.000 records, omitted.
foreach(string fname in XML_files)
{
// get FullName and Birthday from file. Omitted.
var usersLookUp = list_fiodat.ToLookup(u => u.fio, u => u.dat); // create map
var dates = usersLookUp[FullName];
if (dates.Count() > 0)
{
foreach (var dt in dates)
{
if (dt == BirthDate) return true;
}
}
}
and second:
struct pair_fiodat {string fio; string dat;}
List<pair_fiodat> list_fiodat = new List<pair_fiodat>();
// list filled 200.000 records, omitted.
foreach(string fname in XML_files)
{
// get FullName and Birthday from file. Omitted.
var members = from s in list_fiodat where s.fio == FullName & s.dat == Birthdate select s;
if (members.Count() > 0 return true;
}
They make the same job - searching user by name and birthday. The first one work very quick. The second is very slowly (10x-50x) Tell me please if it possible accelerate the second one? I mean may be the list need in special preparing? I tried sorting: list_fiodat_sorted = list_fiodat.OrderBy(x => x.fio).ToList();, but...
CodePudding user response:
I skip your first test and change Count() to Any() (count iterate all list while any stop when there are an element)
public bool Test1(List<pair_fiodat> list_fiodat)
{
foreach (string fname in XML_files)
{
var members = from s in list_fiodat
where s.fio == fname & s.dat == BirthDate
select s;
if (members.Any())
return true;
}
return false;
}
If you want optimize something, you must leave comfortable things that offer the language to you because usually this things are not free, they have a cost. For example, for is faster than foreach. Is a bit more ugly, you need two sentences to get the variable, but is faster. If you iterate a very big collection, each iteration sum. LINQ is very powerfull and it's wonder work with it, but has a cost. If you change it for another "for", you save time.
public bool Test2(List<pair_fiodat> list_fiodat)
{
for (int i = 0; i < XML_files.Count; i )
{
string fname = XML_files[i];
for (int j = 0; j < list_fiodat.Count; j )
{
var s = list_fiodat[j];
if (s.fio == fname & s.dat == BirthDate)
{
return true;
}
}
}
return false;
}
With normal collections there aren't difference and usually you use foeach, LINQ... but in extreme cases, you must go to low level.
In your first test, ToLookup is the key. It takes a long time. Think about this: you are iterating all your list, creating and filling the map. It's bad in any case but think about the case in which the item you are looking for is at the start of the list: you only need a few iterations to found it but you spend time in each of the items of your list creating the map. Only in the worst case, the time is similar and always worse with the map creation due to the creation itself.
The map is interesting if you need, for example, all the items that match some condition, get a list instead found a ingle item. You spend time creating the map once, but you use the map many times and, in each time, you save time (map is "direct access" against the for that is "sequencial").