I'm working on a project where various files are to be uploaded to a platform. The platform has over 100 DocumentTypeId
s. The files have no set naming convention. So in order to determine a file's DoumentTypeId
I'm currently doing this in a method that returns a string
if (fileName.Contains("401k") || fileName.Contains("401(k)") || fileName.Contains("457b") || fileName.Contains("457(b)") || fileName.Contains("retire"))
{
return "401k-and-retirement";
}
else if (fileName.Contains("aflac"))
{
return "aflac";
}
else if ( fileName.Contains("beneficiary form") || fileName.Contains("beneficiaries") || fileName.Contains("beneficiary")
)
{
return "beneficiary-forms";
}
else if (fileName.Contains("benefit enrollment") || fileName.Contains("benefits enrollment") || fileName.Contains("benefit form") || fileName.Contains("benefits form") || fileName.Contains("benefit paperwork") || fileName.Contains("qualifying event") || fileName.Contains("enrollment") || fileName.Contains("b enroll") || fileName.Contains("benefit enrollnent")) //yes, the typo is on purpose. there are typos in some of the file names to import
{
return "benefits-election";
}
//etc
As you can imagine, this method is disgustingly ugly and long (~300 lines). I want to refactor this and make use of a database. I'm thinking of having a table with two fields DocumentTypeId
and FileNameContains
where FileNameContains
is a comma-separated list of the OR
case strings. This would allow for adding any cases without doing any code changes.
What I'm unsure about is how to do a string.Contains()
comparison on a database. I'm aware of LIKE
but that's not quite the same as string.Contains()
. I've also thought about querying the database to convert FileNameContains
field into an array or List for each record and doing an extension method(something like this) that loops through and does a string.Contains()
. But that doesn't seem very efficient and fast.
Am I approaching this wrong? I just know there has to be a better way than a bunch of else if
statements with OR
cases. And I really think having a database would make this more elegant and scalable without any code changes and purely SQL UPDATE
statements. Some help and input would be greatly appreciated.
CodePudding user response:
I'd use a dictionary, or list of keyvaluepair.. the key being "find this", the value being "the file type"
var d = new Dictionary<string, string>{
{ "401k", "401k-and-retirement" },
{ "401(k)", "401k-and-retirement" },
{ "457b", "401k-and-retirement" },
{ "457(b)", "401k-and-retirement" },
{ "retire", "401k-and-retirement" },
{ "aflacs", "aflacs" },
...
};
foreach(var kvp in d)
if(filename.Contains(kvp.Key)) return kvp.Value;
Add more entries to your list/dict, or even fill it from a db
What I'm unsure about is how to do a string.Contains() comparison on a database
Well, you could transport this same concept into the db and store your values like this in your table:
Find, Ret
@1k%, 401k-and-retirement
@1(k)%, 401k-and-retirement
And query like:
SELECT ret FROM table WHERE @pFilename LIKE Find
With a c# side parameter of
//adjust type and size to match your column
command.Parameters.Add("@pFilename", SqlDbType.VarChar, 50).Value = "my401k.txt";
Or whatever equivalent you'll use in Dapper, EF etc..
context.FindRets.FirstOrDefault(fr => EF.Functions.Like(filename, fr.Find))
For the love of a relevant deity, please don't store a CSV in a table column. It will bite you sooner than later
CodePudding user response:
I would usually do something like this:
var contains = new []
{
new
{
find = new [] { "401k", "401(k)", "457b", "457(b)", "retire" },
result = "401k-and-retirement"
},
new { find = new [] { "aflac" }, result = "aflac" },
new
{
find = new [] { "beneficiary form", "beneficiaries", "beneficiary" },
result = "beneficiary-forms"
},
new
{
find = new []
{
"benefit enrollment", "benefits enrollment", "benefit form", "benefits form", "benefit paperwork",
"qualifying event", "enrollment", "b enroll", "benefit enrollnent"
},
result = "benefits-election"
},
};
return
contains
.Where(x => x.find.Any(f => fileName.Contains(f)))
.Select(x => x.result)
.FirstOrDefault();
The advantage is that it's easier to add and maintain the items you're looking for. It's all on one portion of the screen.
You could go one step further and save this away in a text file that looks like this:
401k-and-retirement
401k
401(k)
457b
457(b)
retire
aflac
aflac
beneficiary-forms
beneficiary form
beneficiaries
beneficiary
benefits-election
benefit enrollment
benefits enrollment
benefit form
benefits form
benefit paperwork
qualifying event
enrollment
b enroll
benefit enrollnent
Then you can do this:
var contains =
File
.ReadLines("config.txt")
.Aggregate(
new[] { new { find = new List<string>(), result = "" } }.ToList(),
(a, x) =>
{
if (x.StartsWith(' '))
{
a.Last().find.Add(x.Substring(1));
}
else
{
a.Add(new { find = new List<string>(), result = x });
}
return a;
}, a => a.Skip(1).ToArray());
contains.Dump();
return
contains
.Where(x => x.find.Any(f => fileName.Contains(f)))
.Select(x => x.result)
.FirstOrDefault();
Now you can just add more items to the config file as you need.