Home > OS >  How to extract specific string from a text?
How to extract specific string from a text?

Time:08-21

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using Newtonsoft.Json;

namespace Rename_Files
{
    public partial class Form1 : Form
    {
        string[] files;
        public Form1()
        {
            InitializeComponent();

            files = Directory.GetFiles(@"C:\Program Files (x86)\Steam\steamapps\common\King's Quest\Binaries\Win\Saved Games", "*.*", SearchOption.AllDirectories);

            for(int i = 2; i < files.Length; i  )
            {
                string text = File.ReadAllText(files[i]);
                int startPos = text.IndexOf("currentLevelName");
                int length = text.IndexOf("currentLevelEntryDirection") - 3;
                string sub = text.Substring(startPos, length);
            }
        }

        private void Form1_Load(object sender, EventArgs e)
        {

        }
    }
}

The part i want to extract is :

currentLevelName":"E1_WL1_FindBow_M","currentLevelEntryDirection"

This is a part of the file content :

m_ItemsEncodedJsons    ArrayProperty               None !   m_WhatLevelPlayerIsAtEncodedJson    ArrayProperty O          G   {"currentLevelName":"E1_WL1_FindBow_M","currentLevelEntryDirection":8} &   m_WhatCheckPointPlay

the way i'm trying now i'm getting exception because

System.ArgumentOutOfRangeException: 'Index and length must refer to a location within the string. Parameter name: length'

startPos value is : 1613 and the value of length is 1653

so the exception is logic but i'm not sure yet how to extract the specific string out of the text.

Update :

this is almost working :

int startPos = text.IndexOf("currentLevelName");
int length = text.IndexOf("currentLevelEntryDirection");
string sub = text.Substring(startPos, length - startPos);

the result in sub is :

"currentLevelName\":\"E1_WL1_HangingBedsA_M\",\""

but i want that sub will contain this :

currentLevelName"E1_WL1_HangingBedsA_M\"

optional without the two "" either and maybe to add _

currentLevelName_"E1_WL1_HangingBedsA_M\"

or

currentLevelName_E1_WL1_HangingBedsA_M\

CodePudding user response:

The problem you are facing is indeed this one:

How to extract the content with specific pattern from a String?

In this case, you can use Regular Expression to extract the content you want.

Given the following text:

m_ItemsEncodedJsons ArrayProperty None ! m_WhatLevelPlayerIsAtEncodedJson ArrayProperty O G {"currentLevelName":"E1_WL1_FindBow_M","currentLevelEntryDirection":8} & m_WhatCheckPointPlay

By using this Regex pattern:

string pattern = @"""currentLevelName"":"".*"",""currentLevelEntryDirection"":\d ";

You will be able to extract the following content:

"currentLevelName":"E1_WL1_FindBow_M","currentLevelEntryDirection":8

Here is the code snippet in C#:

using System;
using System.Text.RegularExpressions;

public class Example
{
    public static void Main()
    {
        // this is the original text
        string input = @"m_ItemsEncodedJsons ArrayProperty None ! m_WhatLevelPlayerIsAtEncodedJson ArrayProperty O G {""currentLevelName"":""E1_WL1_FindBow_M"",""currentLevelEntryDirection"":8} & m_WhatCheckPointPlay";

        // this is the pattern you are looking for
        string pattern = @"""currentLevelName"":"".*"",""currentLevelEntryDirection"":\d ";
        
        RegexOptions options = RegexOptions.Multiline;
        
        foreach (Match m in Regex.Matches(input, pattern, options))
        {
            Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
        }
    }
}

One of the reasons you should use Regex in this case is that, if the value of currentLevelEntryDirection is not single-digit, e.g. 8123, the above code snippet can still be able to extract the correct value.

You can also find the above example and edit it here: Regex101

CodePudding user response:

it seems the content is separated by a space delimiter. and the positions are fixed. If so, you could do something like :

var splitted = text.Split(' ');
var json = splitted[8]; // this is the json part in the content;

However, since we don't know wither the content might change or not. You can still use this :

var startPos = text.IndexOf('{');
var endPos = text.IndexOf('}')   1;
var json = text.Substring(startPos, endPos - startPos);

This would extract the Json part of the file. Now, you can implement a json model that will be used to deserialize this json like this :

using System.Text.Json;
using System.Text.Json.Serialization;

public class JsonModel
{
    [JsonPropertyName("currentLevelName")]
    public string? CurrentLevelName { get; set; }
    
    [JsonPropertyName("currentLevelEntryDirection")]
    public int CurrentLevelEntryDirection { get; set; }
}

With that we can do :

var result = JsonSerializer.Deserialize<JsonModel>(json);
var leveName = result.CurrentLevelName;

CodePudding user response:

To get the specific string pattern in a non-JSON format data string

Use the regex to get the stirng and operate it will be good I thought.

By using the regex: "currentLevelName":"\w " In your example code, your will get: "currentLevelName":"E1_WL1_HangingBedsA_M"

Then use the result to create or replace your file name.

the code below will get the savedGame001.txt's content and extract the currentLevelName block, then create a new file whcih the name is in this format: [filename]_[theCurrentLevelName]

using System.Text.RegularExpressions;

// your file path
string filePath = @"C:\Users\a0204\Downloads";

// your file name
string fileName = @"savedGame001.txt";

// read file content
string stringContent = string.Empty;
stringContent = System.IO.File.ReadAllText(filePath   "\\"   fileName);

// Get the mathced string By regex => "currentLevelName":"\w "
var regex = new Regex("\"currentLevelName\":\"\\w \"");
Match matched = regex.Match(stringContent);
string matchedString = matched.Value;

// Get the string below the colon
int colonPosition = matchedString.IndexOf(":");
string value = matchedString.Substring(colonPosition   1);
value = value.Replace("\"", string.Empty);

// remove the .txt and add the matched string to file name
fileName = fileName.Remove(fileName.Length - 4, 4);
string newFileName = fileName   "_"   value;

// check the new file name
Console.WriteLine(newFileName);

// write content to new file name 
FileStream fileStream = File.Create(filePath   "\\"   newFileName);
fileStream.Dispose();
File.WriteAllText(filePath   "\\"   newFileName, stringContent);

Console.ReadLine();
  •  Tags:  
  • c#
  • Related