Home > Net >  C# Extract json object from mixed data text/js file
C# Extract json object from mixed data text/js file

Time:09-02

I need to parse reactjs file in main.451e57c9.js to retrieve version number with C#. This file contains mixed data, here is little part of it:

.....inally{if(s)throw i}}return a}}(e,t)||xe(e,t)||we()}var Se=
JSON.parse('{"shortVersion":"v3.1.56"}')
,Ne="
AASAAAAAqCAYAAAATb4ZSAAAACXBIWXMAAAsTAAALEw.....

I need to extract json data of {"shortVersion":"v3.1.56"}

The last time I tried to simply find the string shortVersion and return a certain number of characters after, but it seems like I'm trying to create the bicycle from scratch. Is there proper way to identify and extract json from the mixed text?

public static void findVersion()
{
    var partialName = "main.*.js";
    string[] filesInDir = Directory.GetFiles(@pathToFile, partialName);

    var lines = File.ReadLines(filesInDir[0]);

    foreach (var line in File.ReadLines(filesInDir[0]))
    {
        string keyword = "shortVersion";
        int indx = line.IndexOf(keyword);

        if (indx != -1)
        {
            string code = line.Substring(indx   keyword.Length);
            Console.WriteLine(code);
        }
    }
}

RESULT

":"v3.1.56"}'),Ne=".....

CodePudding user response:

           string findJson(string input, string keyword) {
                int startIndex = input.IndexOf(keyword) - 2; //Find the starting point of shortversion then subtract 2 to start at the { bracket
                input = input.Substring(startIndex); //Grab everything after the start index

                int endIndex = 0;
                for (int i = 0; i < input.Length; i  ) {
                    char letter = input[i];
                    if (letter == '}') {
                        endIndex = i; //Capture the first instance of the closing bracket in the new trimmed raw string.
                        break;
                    }
                }

                return input.Remove(endIndex 1);
            }
            Console.WriteLine(findJson("fwekjfwkejwe{'shortVersion':'v3.1.56'}wekjrlklkj23klj23jkl234kjlk", "shortVersion"));

You will recieve {'shortVersion':'v3.1.56'} as output

Note you may have to use line.Replace('"', "'");

CodePudding user response:

Try below method -

 public static object ExtractJsonFromText(string mixedStrng)
    {
        for (var i = mixedStrng.IndexOf('{'); i > -1; i = mixedStrng.IndexOf('{', i   1))
        {
            for (var j = mixedStrng.LastIndexOf('}'); j > -1; j = mixedStrng.LastIndexOf("}", j -1))
            {
                var jsonProbe = mixedStrng.Substring(i, j - i   1);
                try
                {
                    return JsonConvert.DeserializeObject(jsonProbe);
                }
                catch
                {                        
                }
            }
        }
        return null;
    }

Fiddle https://dotnetfiddle.net/N1jiWH

CodePudding user response:

Use this this should be faster - https://dotnetfiddle.net/sYFvYj

 public static object ExtractJsonFromText(string mixedStrng)
        {
            string pattern = @"\(\'\{.*}\'\)";
            string str = null;
            foreach (Match match in Regex.Matches(mixedStrng, pattern, RegexOptions.Multiline))
            {
                if (match.Success)
                {

                    str = str   Environment.NewLine   match;
                }
            }
           

            return str;
        }

CodePudding user response:

You should not use GetFiles() since you only need one and that returns all before you can do anything. This should give your something you can work with here and it should be as fast as it likely can be with big files and/or lots of files in a folder (to be fair I have not tested this on such a large file system or file)

using System;
using System.IO;
using System.Linq;

public class Program
{
    public static void Main()
    {
        Console.WriteLine("Hello World");
        var path = $@"c:\SomePath";
        var jsonString = GetFileVersion(path);
        if (!string.IsNullOrWhiteSpace(jsonString))
        {
            // do something with string; deserialize or whatever.
            var result=JsonConvert.DeserializeObject<List<Version>>(jsonString);
            var vers = result.shortVersion;
        }
    }

    private static string GetFileVersion(string path)
    {
        var partialName = "main.*.js";
        // JSON string fragment to find: doubled up braces and quotes for the $@ string
        string matchString = $@"{{""shortVersion"":";
        string matchEndString = $@" ""}}'";
        // we can later stop on the first match
        DirectoryInfo dir = new DirectoryInfo(path);
        if (!dir.Exists)
        {
            throw new DirectoryNotFoundException("The directory does not exist.");
        }

        // Call the GetFileSystemInfos method and grab the first one
        FileSystemInfo info = dir.GetFileSystemInfos(partialName).FirstOrDefault();
        if (info.Exists)
        {
            // walk the file contents looking for a match (assumptions made here there IS a match and it has that string noted)
            var line = File.ReadLines(info.FullName).SkipWhile(line => !line.Contains(matchString)).Take(1).First();
            var indexStart = line.IndexOf(matchString);
            var indexEnd = line.IndexOf(matchEndString, indexStart);
            var jsonString = line.Substring(indexStart, indexEnd   matchEndString.Length);
            return jsonString;
        }

        return string.Empty;
    }

    public class Version
    {
        public string shortVersion { get; set; }
    }
}
  • Related