Home > Software design >  Convert unicode to characters in a file using Ruby
Convert unicode to characters in a file using Ruby

Time:07-24

I have this string in a code.txt file.

"class Solution {\u000Apublic:\u000A    vector\u003Cvector\u003Cint\u003E\u003E insert(vector\u003Cvector\u003Cint\u003E\u003E\u0026 intervals, vector\u003Cint\u003E\u0026 newInterval) {\u000A        int len \u003D intervals.size()\u003B\u000A        int index \u003D 0\u003B\u000A        vector\u003Cvector\u003Cint\u003E \u003E ans\u003B\u000A        \u000A\u000A        while(index \u003C len \u0026\u0026 intervals[index][1] \u003C newInterval[0]) ans.push_back(intervals[index  ])\u003B\u000A        \u000A        while(index \u003C len \u0026\u0026 intervals[index][0] \u003C\u003D newInterval[1]) {\u000A            newInterval[0] \u003D min(intervals[index][0], newInterval[0])\u003B\u000A            newInterval[1] \u003D max(intervals[index][1], newInterval[1])\u003B\u000A            index  \u003B\u000A        }\u000A        \u000A        ans.push_back(newInterval)\u003B\u000A        \u000A        while(index \u003C len) ans.push_back(intervals[index  ])\u003B\u000A\u000A        return ans\u003B \u000A    }\u000A}\u003B                         "

I would like to convert this string to C syntex and write to solution.cpp file.

The content in solution.cpp will look like..

class Solution {
public:
    vector<vector<int>> insert(vector<vector<int>>& intervals, vector<int>& newInterval) {
        int len = intervals.size();
        int index = 0;
        vector<vector<int> > ans;
        

        while(index < len && intervals[index][1] < newInterval[0]) ans.push_back(intervals[index  ]);
        
        while(index < len && intervals[index][0] <= newInterval[1]) {
            newInterval[0] = min(intervals[index][0], newInterval[0]);
            newInterval[1] = max(intervals[index][1], newInterval[1]);
            index  ;
        }
        
        ans.push_back(newInterval);
        
        while(index < len) ans.push_back(intervals[index  ]);

        return ans; 
    }
};       

I have tried enforcing/converting encoding to UTF-8 but the string stays the same.

code = File.read('code.txt')
code = code.encode('UTF-8')
file = File.open('solution.cpp', "w:UTF-8")
file.write(code)

How can I do this? Thank you.

CodePudding user response:

So, I have tried to reproduce your problem and got the same result as described by using your solution.

I have noticed that \u003B (for example) is a unicode code for semicolon character. So, I analyzed the string for each "U " notation using regex /\\u(.{4})/, as it marks "hexadecimal digits" as being Unicode code points. Then used gsub! and Array#pack to convert and substitute each of the Unicode chars.

[$1.to_i(16)].pack('U') # => "\n", "\n", "<", "&", "\n", "=" ...etc.

And finally wrote the result to a file. So, my final approach looks like this:

code = File.read('code.txt')

code.gsub!(/\\u(.{4})/) do |match|
  [$1.to_i(16)].pack('U')
end

File.open('solution.cpp', 'w') { |f| f.puts code.gsub!(/\A"|"\Z/, '') }

Also note, I have used gsub again at the end, to search for the leading or trailing quote and replace it with an empty string when writing to a file.

  • Related