I have transaction data in a CSV with columns Customer ID, Transaction Amount, Transaction Date
. I have a function that accepts transactions_csv_file_path as string, N as an integer as params. I want to return the best N customers from the transaction data. NOTE:
[best customer as the one with the longest period of consecutive daily payments]`. I can read the CSV as below:
public static string[,] ProcessCSV(string file_path, int n)
{
List<string> transData = new List<string>();
using (StreamReader sr = new StreamReader(file_path))
{
string strResult = sr.ReadToEnd();
var values = strResult.Split(',');
transData.Add(values[0]);
transData.Add(values[1]);
}
return transData.ToArray();
}
when debugging, I only get the columns headers without data. I want to get the daily consecutive payments by date and return the customerIds, for example: if N=1, I expect the output to be ['K20008']
, if N=3, output: ['K20987', 'K20008', 'K20233']
How do I get the array data from the CSV and get the best N customer IDs with the longest period of consecutive daily payments?
To consider:define consecutive daily payments as at least 1 transaction per calendar day. and If there are any ties, use ascending order to break ties. For example, K20003 comes before K20005
CodePudding user response:
I'd perhaps make a method that calculated the longest run length:
int MaxRun(IEnumerable<DateTime> ds){
int max = 0;
int current = 0;
var prev = DateTime.MinValue;
foreach(var d in ds.Distinct().OrderBy(x => x)){
if((d - prev).Days == 1)
current ;
else
current = 0;
prev = d;
if(current > max)
max = current;
}
return max;
}
And then use a bit of LINQ to group the people, calc the maxrun, order the transdates by the maxrun, and output the people:
transactions
.GroupBy(t => t.Customer, t => t.TransactionDate )
.Select(g => new { g.Key, MR = MaxRun(g) })
.OrderBy(at => at.MR)
.ThenBy(at => at.Key)
.Select(at => at.Key)
.ToArray()