Home > Back-end >  Shopware 6 import performance
Shopware 6 import performance

Time:08-10

I need to sync the products from an ERP, which gives it us the product data without delta, every night. So i made an import script which actually runs well, but it use ages. There are only around 2000 products, so it shouldn't take that long but it takes around 7 hours! I do have to delete the association for product properties and also delete the missing products, but still 7 hours are way too much. The server is scalable could server with up to 8 GB RAM.

Here is the code (simplified):

    /*
     * Amount of entities to execute per round.
     * */
    public const BATCH = 200;

    public function execute()
    {
        // read products from import xml file
        $importProducts = $this->loadProducts();
        $csvBatch = array_chunk($importProducts, self::BATCH);
        $productNumbers = [];

        foreach ($csvBatch as $products) {
            $productNumbers[] = $this->processImportProducts($products, false);
        }

        $this->deleteProducts(array_merge(...$productNumbers));

        return 0;
    }

    private function processImportProducts($productsData)
    {
        $products = [];
        $productNumbers = [];

        foreach ($productsData as $product) {
            $products[$product['ProductNo']] = $this->importProducts($product);
            $productNumbers[] = $product['ProductNo'];
        }

        // upsert product
        try {
            $this->cleanProductProperties($products, $this->context);
            $this->productRepository->upsert(array_values($products), $this->context);
        } catch (WriteException $exception) {
            $this->logger->info(' ');
            $this->logger->info('<error>Products could not be imported. Message: '. $exception->getMessage() .'</error>');
        }
        unset($products);

        return $productNumbers;
    }


    private function cleanProductProperties($products, $context)
    {
        $productIds = array_values(array_map(static function($product){
            return $product['id'];
        }, $products));

        $productProperties = $this->productPropertyRepository->searchIds(
            (new Criteria())->addFilter(new EqualsAnyFilter('productId', $productIds)),
            $context
        );

        $productRelationsToDelete = [];
        foreach($productProperties->getIds() as $productProperty) {
            $productRelationsToDelete[] = [
                'productId' => $productProperty['product_id'],
                'optionId' => $productProperty['property_group_option_id']
            ];
        }

        $this->productPropertyRepository->delete($productRelationsToDelete, $context);

        unset($productIds, $productProperties, $productRelationsToDelete);
    }

    private function importProducts($product)
    {
        // search product by productNumber
        $productSearch = $this->productRepository->search(
            (new Criteria())->addFilter(new EqualsFilter('productNumber', $productNumber)),
            $this->context
        );

        $existingProduct = $productSearch->getEntities()->first();
        if ($existingProduct) {
            $productId = $existingProduct->getId();
        } else {
            $productId = Uuid::randomHex();
        }

        $productData = [
            'id' => $productId,
            'productNumber' => $productNumber,
            'price' => [
                [
                    'currencyId' => Defaults::CURRENCY,
                    'gross' => 0,
                    'net' => 0,
                    'linked' => true
                ]
            ],
            'stock' => 99999,
            'taxId' => $this->taxId,
            'name' => $productNames,
            'description' => $productDescriptions
        ];

        return $productData;
    }

There is also Elasticsearch running, which also for sure has an influence. Does anyone has an idea how to improve it? Is there for instance a way to deactivate indexing of Elasticsearch on repository upsert? Are there better ways to reduce memory usage?

Thanks for any advice!

CodePudding user response:

Are you running this in a dev or prod environment? It is highly recommended to be in a prod environment when doing large scale data operations like that.

You could speed up the process a lot by indexing data asynchronously. This will populate the message queue instead. Entities will generally be persisted faster that way but might still be missing some essential data until the messages in the queue have been processed. You can set to index data in the queue by setting the context state:

$context->addState(EntityIndexerRegistry::USE_INDEXING_QUEUE);

You could also just completely deactivate indexing but it's not recommended:

$context->addState(EntityIndexerRegistry::DISABLE_INDEXING);

Another alternative would be to just skip specific indexers. If you know for certain you don't need to index specific data, you can set it to skip either an entire entity indexer or specific updaters within those.

$context->addExtension(EntityIndexerRegistry::EXTENSION_INDEXER_SKIP, new ArrayEntity(['category.indexer']));
  • Related