Home > Software design >  Find all external relationships in an OOXML/Excel spreadsheet
Find all external relationships in an OOXML/Excel spreadsheet

Time:08-04

I am trying to register all external relationships in a OOXML spreadsheet. The code below provides no results. What am I doing wrong? I suspect I am looking the wrong place, I am looking in "WorkSheetParts". I am using Open XML SDK.

using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Spreadsheet;

using (var spreadsheet = SpreadsheetDocument.Open(file, false))
{
    var list = spreadsheet.WorkbookPart.WorksheetParts.ToList();
    foreach (var item in list)
    {
        int count = item.ExternalRelationships.Count(); // Register the number of embedded objects
        if (count == 0) // If no embedded objects, inform user
        {
            Console.WriteLine($"--> {count} external relationships");
        }
        else
        {
            Console.WriteLine($"--> {count} external relationships");
            var parts = item.ExternalRelationships.ToList(); // Register each object to a list

            foreach (var part in parts) // Inform user of each object
            {
                Console.WriteLine(part.Uri.ToString());
                Console.WriteLine(part.RelationshipType.ToString());
                Console.WriteLine(part.IsExternal.ToString());
                Console.WriteLine(part.Container.ToString());
            }
        }
    }
}

CodePudding user response:

TL;DR: You're traversing the OOXML Spreadsheet document structure incorrectly.

The quick-fix is to use the built-in GetAllParts() extension method, then for each OpenXmlPart entry inspect its ExternalRelationships property (it's a lazily-evaluated enumerable, not an in-memory collection), and looking for non-empty collections of ExternalRelationship objects... or a single linq expression, like so:

static List<ExternalRelationship> GetExternalReferences( SpreadsheetDocument spreadsheet )
{
    return spreadsheet
        .GetAllParts()
        .SelectMany( p => p.ExternalRelationships )
        .ToList();
}

As a more complete example:

using System;
using System.IO;
using System.Collections.Generic;
using System.Linq;

using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Spreadsheet;

static void Main()
{
    DirectoryInfo root = new DirectoryInfo( @"Test_Data" );

    foreach( FileInfo xlsx in root.GetFiles( "*.xlsx", SearchOption.AllDirectories ) )
    {
        SpreadsheetDocument spreadsheet;
        try
        {
            spreadsheet = SpreadsheetDocument.Open( path: xlsx.FullName, isEditable: false );
        }
        catch( Exception ex )
        {
            Console.WriteLine( "File \"{0}\" cannot be opened: {1}", xlsx.FullName, ex.Message );
            continue;
        }

        using( spreadsheet )
        {
            List<ExternalRelationship> extRels = spreadsheet
                .GetAllParts()
                .SelectMany( p => p.ExternalRelationships )
                .ToList();

            if( extRels.Count > 0 )
            {
                Console.WriteLine( "File \"{0}\" has {1:N0} external references:", xlsx.FullName, extRels.Count );
                foreach( ExternalRelationship r in extRels )
                {
                    Console.WriteLine( "\tUri             : \"{0}\"", r.Uri );
                    Console.WriteLine( "\tRelationshipType: \"{0}\"", r.RelationshipType );
                    Console.WriteLine( "\tIsExternal      : {0}", r.IsExternal );
                    Console.WriteLine( "\tContainer       : {0}", r.Container );
                    Console.WriteLine();
                }
            }
            else
            {
                Console.WriteLine( "File \"{0}\" has no external references.", xlsx.FullName );
            }
        }
    }

}

Output when using files from the Test_Data.zip you posted:

File "Test_Data\Workbook_strict.xlsx" has no external references.
File "Test_Data\Workbook_transitional.xlsx" has no external references.
File "Test_Data\Another folder\123.xlsx" has no external references.
File "Test_Data\Another folder\Created as Strict_Now Transitional.xlsx" has no external references.
File "Test_Data\Another folder\Created as Transitional_Now Strict.xlsx" has no external references.
File "Test_Data\Another folder\Password protected.xlsx" cannot be opened: End of Central Directory record could not be found.
File "Test_Data\Another folder\With chains to cells in another spreadsheet.xlsx" has 1 external references:
    Uri             : "123.xlsx"
    RelationshipType: "http://schemas.openxmlformats.org/officeDocument/2006/relationships/externalLinkPath"
    IsExternal      : True
    Container       : DocumentFormat.OpenXml.Packaging.ExternalWorkbookPart

File "Test_Data\Another folder\With data connection.xlsx" has no external references.
File "Test_Data\Another folder\With embedded objects.xlsx" has 1 external references:
    Uri             : "file:///C:/Users/Sepideh/Desktop/Test_Data/Another folder/123.xlsx"
    RelationshipType: "http://schemas.openxmlformats.org/officeDocument/2006/relationships/oleObject"
    IsExternal      : True
    Container       : DocumentFormat.OpenXml.Packaging.ExternalWorkbookPart

File "Test_Data\Another folder\Another folder\1234.xlsx" has no external references.
File "Test_Data\Another folder\Another folder\random1.xlsx" has no external references.
File "Test_Data\Another folder\Another folder 2\New.xlsx" has no external references.
File "Test_Data\Another folder\Another folder\Last folder\some_filename.xlsx" has no external references.
  • Related