Home > Enterprise >  How to get a list of unique python modules imported from different scripts located in different fold
How to get a list of unique python modules imported from different scripts located in different fold

Time:11-04

I have a project contained in a folder (src). It is divided into different folders and each one contains some script .py. An example of the project structure is the following:

├── src                <- Source code for use in this project.
│   │
│   ├── data           <- Scripts to download or generate data.
│   │
│   ├── features       <- Scripts to turn raw data into features for modeling.
│   │
│   ├── models         <- Scripts to train models and then use trained models to make predictions.

Like I said before, inside each folder under src I have a different python script. For example:

├── data               
   │
   ├── create_dataframe.py
   ├── imputation.py
.
.
.

There is a way to get a list of all unique python modules used in each separate script?

For example, if I have in create_dataframe.py

import pandas as pd

and in imputation.py I have

import pandas as pd
import numpy  as np
from   tqdm import tqdm

The desired output would be

Output: ['pandas', 'numpy', 'tqdm']

I am not interested in the module's version. I just want the name of the module.

CodePudding user response:

Run this script from src folder:

import pathlib

path = pathlib.Path.cwd()
modules = set()

for file in path.glob('**/*.py'):
    lines = file.read_text(encoding='utf-8')
    for line in lines.split('\n'):
        if line.startswith('import ') or line.startswith('from '):
            modules.add(line.split()[1].replace(';','').replace(',',''))

for module in sorted(modules):
    print(module)
  • Related