I have a project contained in a folder (src
). It is divided into different folders and each one contains some script .py
. An example of the project structure is the following:
├── src <- Source code for use in this project.
│ │
│ ├── data <- Scripts to download or generate data.
│ │
│ ├── features <- Scripts to turn raw data into features for modeling.
│ │
│ ├── models <- Scripts to train models and then use trained models to make predictions.
Like I said before, inside each folder under src
I have a different python script. For example:
├── data
│
├── create_dataframe.py
├── imputation.py
.
.
.
There is a way to get a list of all unique python modules used in each separate script?
For example, if I have in create_dataframe.py
import pandas as pd
and in imputation.py
I have
import pandas as pd
import numpy as np
from tqdm import tqdm
The desired output would be
Output: ['pandas', 'numpy', 'tqdm']
I am not interested in the module's version. I just want the name of the module.
CodePudding user response:
Run this script from src
folder:
import pathlib
path = pathlib.Path.cwd()
modules = set()
for file in path.glob('**/*.py'):
lines = file.read_text(encoding='utf-8')
for line in lines.split('\n'):
if line.startswith('import ') or line.startswith('from '):
modules.add(line.split()[1].replace(';','').replace(',',''))
for module in sorted(modules):
print(module)