Home > Back-end >  Does a multi programming language parsing / function extraction toolkit exist?
Does a multi programming language parsing / function extraction toolkit exist?

Time:12-31

I'm looking for a way to extract function names and their definitions from multiple different programming languages. I would like to avoid writing extractors by hand as I want to support about 15 programming languages.

Is there a library / program that could be used to achieve this? Searching didn't give me any useful results.

I'm currently using go for my application, but I don't mind handling this in a different language.

The app itself will be open-source so proprietary solutions are not desired.

CodePudding user response:

This isn't easy to do, because each language has different rules about legal syntax and what constitutes a "function".

I can offer my company's DMS Software Reengineering Toolkit as a way to do this. We've fought the battle of parsing multiple languages (maybe all of your 15, see list of languages supported by DMS) and building various kinds of fact-extraction machinery. You'd have to customize it for the specific facts you want to extract.

[Yes, its proprietary. OP added a not-proprietary requirement after I answered this question. Other folks might not have this constraint.]

CodePudding user response:

If you just want to extract functions instead of parsing the source files then the traditional way to do this is using ctags.

Most unix-like OSes either comes with ctags already installed or has ctags available. However, ctags is not a single program. Like other unix utility programs it may have started as a single program but by now there are several implementations of ctags.

The most widely used implementation is probably Exuberant Ctags. It has a fairly good coverage of languages but it does not handle a lot of more modern languages (for example, it does not natively handle go). It currently supports around 40 languages: http://ctags.sourceforge.net/languages.html

Universal Ctags is a more recent project and I believe started as a fork of Exuberant Ctags. Universal Ctags supports a lot more languages (including go): https://github.com/universal-ctags/ctags/tree/master/parsers

Ctags generates a tags file containing information of all the objects found. The actual format of the tags file depends on the implementation of the ctags program but they generally contain what type of object was found (variable, class, function etc.), the file it was found in, the line number and for Exuberant Ctags the search term you need to find the object (sometimes a string literal sometimes a regexp).

  • Related