Home > Software engineering >  C#: Reading a special formatted CSV
C#: Reading a special formatted CSV

Time:09-19

My problem is that my csv file has data stored in form of a json file. I now need to extract that data in the most efficient way to create objects that store the data.

My csv-file looks like this:

1. 2022-09-19,"{
2.   "timestamp": 41202503,
3.   "machineId": 3567,
4.   "status": 16,
5.   "isActive": false,
6.   "scanWidth": 5.0,
7. }"
8. 2022-09-19,"{
9.   "timestamp": 41202505,
10.  "machineId": 3568,
11.   "status": 5,
12.   "isActive": true,
13.   "scanWidth": 1.4,
14. }"
15. 2022-09-19,"{
16.   "timestamp": 41202507,
17.   "machineId": 3569,
18.   "status": 12,
19.   "isActive": false,
20.   "scanWidth": 6.2,
21. }"

In my project I would have class called "MachineData" with all the relevant properties. My question is now, how can I extract the data stored in this csv file?

Thanks again for helping!

CodePudding user response:

Create a type to represent this data:

class ResultItem
{
    public string Date { get; set; }
    public string Timestamp { get; set; }
    public string MachineId { get; set; }
    public string Status { get; set; }
    public string IsActive { get; set; }
    public string ScanWidth { get; set; }
}

Use Regex and Newtonsoft.Json to extract the data:

//Remove the line number
csvText = Regex.Replace(csvText, @"\d \. ", "");

//Match items with separating the date and the json in different groups
var matches = Regex.Matches(csvText, @"(?<date>\d{4}-\d{2}-\d{2}),""(?<json>(. \n){6}})", RegexOptions.CultureInvariant | RegexOptions.Multiline);

var results = new List<ResultItem>();
foreach (Match match in matches)
{
    //Getting values from json group
    var item = JsonConvert.DeserializeObject<ResultItem>(match.Groups["json"].Value);
    //Getting value from date group
    item.Date = match.Groups["date"].Value;
    results.Add(item);
}

CodePudding user response:

I wouldn't normally recommend parsing json with Regular Expressions, but this is a special case, so here is a solution entirely based on RegEx:

string pattern = @"(?<date>[0-9-] ),""{\s ""timestamp"":\s (?<timestamp>[0-9] ),\s ""machineId"":\s (?<machineId>[0-9] ),\s ""status"":\s (?<status>[0-9] ),\s ""isActive"":\s (?<isActive>(true|false)),\s ""scanWidth"":\s (?<scanWidth>[0-9\.] ),\s }""";  
Regex rg = new Regex(pattern, RegexOptions.Multiline);  
  
foreach (Match match in rg.Matches(File.ReadAlltext("nameoffile.csv")))  
{
    Console.WriteLine(match.Groups["date"].Value); 
    Console.WriteLine(match.Groups["timestamp"].Value); 
    Console.WriteLine(match.Groups["machineId"].Value); 
    Console.WriteLine(match.Groups["status"].Value); 
    Console.WriteLine(match.Groups["isActive"].Value); 
    Console.WriteLine(match.Groups["scanWidth"].Value); 
}

Note that if the input differs the slightest this will fail (negative values, additional or missing white space, etc.).

EDIT If the line numbers are part of the input, you need to add [0-9] \.\s in the beginning of the Regex to swallow the line number, the dot, and the white space, giving:

string pattern = @"[0-9] \.\s (?<date>[0-9-] ),""{\s ""timestamp"":\s (?<timestamp>[0-9] ),\s ""machineId"":\s (?<machineId>[0-9] ),\s ""status"":\s (?<status>[0-9] ),\s ""isActive"":\s (?<isActive>(true|false)),\s ""scanWidth"":\s (?<scanWidth>[0-9\.] ),\s }""";  
  • Related