I'm a SAS/Python person and I have to back-fill for a C# programmer . . . .
I need to pre-process a file before it is read into my company's software to make sure it is the correct file type. The problem is, the correct file's extension (*.prj) shares it with ESRI's SHP files used in GIS mapping. To make matters worse, the software my company works on actually uses SHP files for mapping. So, as you can imagine, sometimes people get them confused.
So, when I read in a *.prj file, I need to make sure it is not a SHP prj file. The easiest way to reject ESRI SHP *.prj files would be to read the beginning of the file to determine if the first few bytes are one of the following:
- “GEOGCS[“
- “PROJCS[“
- “GEOCCS[“
For the files I have access to, #1 seems to be the most common, but there could be others that I haven't run across. It seems these a called WKT files and could possibly have other leading bytes (see Coordinate System here).
Currently, my software correctly throws an exception when these ESRI SHP files are loaded. The problem, however, is that the message is vague and generic. I want to add a bit of code that if one of these ESRI SHP files is chosen, the user is alerted that the file is a mapping file and that they shouldn't delete it or overwrite it.
I have successfully written a console app to test this out and it "works," but just for the main WKT type. I want to be able to add more search terms if necessary and more importantly, I like the simplicity of the linq code.
So far though, I haven't figured out a way to have multiple search terms using the same methodology. This is more of a learning excercise for me at this point.
I've tried several different options, using lists for example. But I can't get linq to be able to use them with the READLINES
statement.
Any help would be appreciated.
using System;
using System.Xml;
using System.Linq;
using System.Collections.Generic;
namespace TestPgm
{
public class Check_PRJ
{
// check to see if PRJ file is the correct file
public static void Main()
{
Console.Write("Please enter file name and path:");
string fname = Console.ReadLine();
string prj_flag = "GEOGCS[";
string dir = new FileInfo(fname).DirectoryName.ToString();
if (IsPrjFile(fname, prj_flag) == true)
Console.WriteLine("PRJ file is a component of a GIS SHP file. "
"It is not a corrupted file--do not delete.");
else
Console.WriteLine("File is the correct PRJ file.");
static bool IsPrjFile(string input, string search)
{
try
{
return File.ReadLines(input).Any(x => x.Contains(search));
}
catch (Exception ex)
{
return false;
Console.WriteLine(ex);
}
}
}
}
}
An example ESRI SHP prj file looks like:
GEOGCS["GCS_North_American_1983",DATUM["D_North_American_1983",SPHEROID["GRS_1980",6378137,298.257222101]],PRIMEM["Greenwich",0],UNIT["Degree",0.017453292519943295]]
CodePudding user response:
Try to use this
//instead of string its an IEnumerable of string
static bool IsPrjFile(string input, IEnumerable<string> searchItems)
{
try
{
//check if any entry of "searchItems" does contain in x
return File.ReadLines(input).Any(x => searchItems.Any(y => x.Contains(y)));
}
catch (Exception ex)
{
return false;
Console.WriteLine(ex);
}
}
and FYI instead of
string dir = new FileInfo(fname).DirectoryName.ToString();
use this
string dir = Path.GetDirectoryName(fname);
https://learn.microsoft.com/en-us/dotnet/api/system.io.path.getdirectoryname?view=net-7.0
CodePudding user response:
Depending on how large these files can be, I would consider reading the contents into a string rather than iterating over them line-by-line:
static bool IsPrjFile(string inputFile, IEnumerable<string> searchItems)
{
String contents = File.ReadAllText(inputFile);
return searchItems.Any(item => contents.IndexOf(item, StringComparison.Ordinal) > -1);
}