I am new to C# (has experience in other languages before, C , SQL AutoIT). I have a datatable with 10 columns
Name, MemberNoA, MemberNoB, DriverLicense, MobileNo, Address1, Address2, Address3, ProgramJoinned, Remark
The datatable has around 17,000 rows, what I want to do is, if the same person's records appear more than 2 times in the datatable, put a description in a remark field. 4 criteria to define "same person", any one criteria match will treat as "same person"
i Name MemberNoA
ii Name MemberNoB
iii Name DriverLicense
iv Name MobileNo
i.e. if there are 3 records with same Name and same MemberNoA, need to put description into remark field of these 3 records.
I work out result set from the above 4 criteria like this:
var resultCriteria1 = from gpr in dt.AsEnumerable()
group gpr by new {
Fld1 = gpr.Field < string > ("Name"),
Fld2 = gpr.Field < string > ("MemberNoA")
}
into grpp
where grpp.Count() > 2
select new {
Name = grpp.Key.Fld1,
MemA = grpp.Key.Fld2,
Cnt = grpp.Count()
};
after that, I loop thru all rows in dt and for each row, loop thru all result set in 4 criteria:
for (int i = 1; i < dt.Rows.Count; i ) {
foreach(var item in resultCriteria1) {
if ((item.Nam == s trName) && (item.MemA == M emberNoA)) {
dt.Rows[i].ItemArray[9] = d t.Rows[i].ItemArray[9] "Criteria 1 match\r\n";
}
}
}
The program work but run very slow! Is there any method like simple sql statement:
update table1 where
table1.name = table2.name and
table1.MemberNoA = table2.MemberNo2
set table1.Remark = "Criteria 1 match\r\n"
Is there any way to do this in C# or any way to optimize it ? Thanks.
Regds LAM Chi-fung
CodePudding user response:
What you can do is to use hashtable, order your data, and then iterate comparing current row data with previous using a cursor. This should give you Log(n) time complexity.
CodePudding user response:
The problem is that you are making a cartesian join between the grouped results and the original datatable, without using any performant data structures such as a dictionary or hashset.
But you don't actually need to join it at all, the grouped results can actually hold the relevant data rows directly.
The following code should be performant enough
var grouped =
from gpr in dt.Rows.Cast<DataRow>()
group gpr by (
Fld1: (string)gpr["Name"],
Fld2: (string)gpr["MemberNoA"]
)
into grpp
where grpp.Count() > 2
select grpp;
foreach (var grp in grouped)
foreach (var row in grp)
row["Remark"] = "Criteria 1 match\r\n";