Home > Back-end >  What is a general approach for transpiling one language to another?
What is a general approach for transpiling one language to another?

Time:11-19

I would like to transpile JavaScript into LinkScript. I have started like this:

const acorn = require('acorn')
const fs = require('fs')

const input = fs.readFileSync('./tmp/parse.in.js', 'utf-8')

const jst = acorn.parse(input, {
  ecmaVersion: 2021,
  sourceType: 'module'
})

fs.writeFileSync('tmp/parse.out.js.json', JSON.stringify(jst, null, 2))

const linkScriptText = generateLinkScriptText(convertToLinkScriptAst(jst))

fs.writeFileSync('tmp/parse.out.link', linkScriptText)

function convertToLinkScriptAst(jst) {
  const lst = {}
  switch (jst.type) {
    case 'Program':
      convertProgram(jst, lst)
      break
  }
  return lst
}

function convertProgram(jst, lst) {
  lst.zones = []
  jst.body.forEach(node => {
    switch (node.type) {
      case 'VariableDeclaration':
        convertVariableDeclaration(node).forEach(vnode => {
          lst.zones.push(vnode)
        })
        break
      case 'ExpressionStatement':

        break
      default: throw JSON.stringify(node)
    }
  })
}

function convertVariableDeclaration(jst) {
  return jst.declarations.map(dec => {
    switch (dec.type) {
      case 'VariableDeclarator':
        return convertVariableDeclarator(jst.kind, dec)
        break
      default: throw JSON.stringify(dec)
    }
  })
}

function convertVariableDeclarator(kind, jst) {
  return {
    type: 'host',
    immutable: kind === 'const',
    name: jst.id.name,
    value: convertVariableValue(jst.init)
  }
}

function convertVariableValue(jst) {
  if (!jst) return

  switch (jst.type) {
    case 'Literal':
      return convertLiteral(jst)
      break
  }
}

function convertLiteral(jst) {
  switch (typeof jst.value) {
    case 'string':
      return {
        type: 'string',
        value: jst.value
      }
    case 'number':
      return {
        type: 'number',
        value: jst.value
      }
    default: throw JSON.stringify(jst)
  }
}

function generateLinkScriptText(lst) {
  const text = []
  lst.zones.forEach(zone => {
    switch (zone.type) {
      case 'host':
        generateHost(zone).forEach(line => {
          text.push(line)
        })
        break
    }
  })
  return text.join('\n')
}

function generateHost(lst) {
  const text = []
  if (lst.value) {
    switch (lst.value.type) {
      case 'string':
        text.push(`host ${lst.name}, text <${lst.value.value}>`)
        break
      case 'number':
        text.push(`host ${lst.name}, size ${lst.value.value}`)
        break
    }
  } else {
    text.push(`host ${lst.name}`)
  }
  return text
}

Basically, you parse the JS into an AST, then convert this AST somehow into the AST of the target language (LinkScript in this case). Then convert the output AST into text. The question is, what is a general strategy for doing this? It seems quite hard.

In more detail, I need to know all the types of structures that you can create in JavaScript, and all the types of structures you can create in LinkScript, and how one maps to another. In my head, looking at JS I can manually figure out how the corresponding LinkScript should look. But it's a different story trying to programmatically do it, and I am a bit lost on the general approach I should be taking to do this.

First of all, even though I have been doing JavaScript for over 10 years, I don't know the JS AST that well. I am planning on writing some example snippets of code and seeing how the AST looks using acorn. Second, it seems like there are so many combinations of things it is overwhelming.

Do I just keep going down this road I've started on above? Or is there a more structured or disciplined approach? How do I better break the problem down into more manageable chunks?

Also, it is not always as easy as doing a simple one-to-one mapping. Sometimes the order of things change. For example, in JS you might have:

a = x   y

But in LinkScript, that would be:

call add
  bind a, link x
  bind b, link y
  save a

So the assignment expression is sort of reversed. It gets more complicated in other cases.

So it's as if I need to study each individual type of mapping, and come up with a detailed plan or algorithm on how to do that one mapping. Then it seems like there will be THOUSANDS of possible transformation/mapping types I need to study. So in that sense it seems like an extremely time-intensive problem to solve, mentally.

Is there an easier way?

For a long time (years?) I have wanted to do this, but it's always seemed like an extremely arduous task like I'm hinting at. I think it's because I don't clearly see in my head all the different ways/angles I can receive the AST, and I don't know how to boil it down to something I can see.

In addition to just figuring out how to do each type of mapping/transformation, I also should have somewhat decent code that I am able to extend. That is usually my strong suit (coming up with clean code with a simple API), but here I am struggling because yeah I don't see the full picture yet.

CodePudding user response:

Writing a transpiler is a very big job... For a variety of reasons, though, JavaScript workflows are already full of transpilers, so there are many tools to help.

If your target language looks like JavaScript, then you would write your transpiler as a plug-in for Babel: https://babeljs.io/

Otherwise, maybe start with jscodeshift, which will provide you with an easily accessible AST.

Many open-source javascript tools, like eslint, also have javscript parsers in there that you could extract with a bit of effort.

Also see the AST Explorer

Once you have an AST, you would typically process it recursively, maybe following the visitor pattern, to convert each AST node into the equivalent target structure. Then maybe peephole optimization to simplify the resulting AST. Then finally serialize it. jscodeshift comes with a javascript serializer that you could replace with your own.

  • Related