Remove background from a directory of JPEG images-CodePudding

I wrote a code to remove the background of 8000 images but that whole code is taking approximately 8 hours to give the result.

How to improve its time complexity as I have to work on a large dataset in future?
Or do I have to write a whole new code? If it is, please suggest some sample codes.

from rembg import remove
import cv2
import glob
for img in glob.glob('../images/*.jpg'):
   a = img.split('../images/')
   a1 = a[1].split('.jpg')
   try: 
     cv_img = cv2.imread(img)
     output = remove(cv_img)
   except:
     continue
   cv2.imwrite('../output image/'   str(a1[0])   '.png', output)

CodePudding user response：

Check out the U^2Net repository. Like u2net_test.py, Writing your own remove function and using dataloaders can speed up the process. if it is not necessary skip the alpha matting else you can add the alpha matting code from rembg.

def main():

# --------- 1. get image path and name ---------
model_name='u2net'#u2netp



image_dir = os.path.join(os.getcwd(), 'test_data', 'test_images')
prediction_dir = os.path.join(os.getcwd(), 'test_data', model_name   '_results'   os.sep)
model_dir = os.path.join(os.getcwd(), 'saved_models', model_name, model_name   '.pth')

img_name_list = glob.glob(image_dir   os.sep   '*')
print(img_name_list)

#1. dataloader
test_salobj_dataset = SalObjDataset(img_name_list = img_name_list,
                                    lbl_name_list = [],
                                    transform=transforms.Compose([RescaleT(320),
                                                                  ToTensorLab(flag=0)])
                                    )
test_salobj_dataloader = DataLoader(test_salobj_dataset,
                                    batch_size=1,
                                    shuffle=False,
                                    num_workers=1)
for i_test, data_test in enumerate(test_salobj_dataloader):

    print("inferencing:",img_name_list[i_test].split(os.sep)[-1])

    inputs_test = data_test['image']
    inputs_test = inputs_test.type(torch.FloatTensor)

    if torch.cuda.is_available():
        inputs_test = Variable(inputs_test.cuda())
    else:
        inputs_test = Variable(inputs_test)

    d1,d2,d3,d4,d5,d6,d7= net(inputs_test)

    # normalization
    pred = d1[:,0,:,:]
    pred = normPRED(pred)

    # save results to test_results folder
    if not os.path.exists(prediction_dir):
        os.makedirs(prediction_dir, exist_ok=True)
    save_output(img_name_list[i_test],pred,prediction_dir)

    del d1,d2,d3,d4,d5,d6,d7

CodePudding user response：

Ah, you used the example from https://github.com/danielgatis/rembg#usage-as-a-library as template for your code. Maybe try the other example with PIL image instead of OpenCV. The latter is mostly less fast, but who knows. Try it with maybe 10 images and compare execution time.

Here is your code using PIL instead of OpenCV. Not tested.

import glob

from PIL import Image
from rembg import remove

for img in glob.glob("../images/*.jpg"):

    a = img.split("../images/")
    a1 = a[1].split(".jpg")
    try:
        cv_img = Image.open(img)
        output = remove(cv_img)
    except:
        continue
    output.save("../output image/"   str(a1[0])   ".png")