Home > Net >  Loading data from an XML document (not a file)
Loading data from an XML document (not a file)

Time:01-20

I have a lot of logfiles that I want to load into a SQL Server database via a Talend job so that I can preserve the logs without keeping the files in the file system (we generate over 3k log files every day). I do not want to change the jobs to write to the database directly as I don't want the jobs to be dependent on the DB server being available.

I have created a Talend job that can read a log file and convert it to an XML document which I can then send to a SQL Server (2016) stored procedure. The stored procedure is executed, however the contents of the log file are NOT being stored in the database.

I have ruled out the following possible causes:

  • connectivity/permissions - I have logged in to SSMS with the account used by the Talend job and executed the procedure (minus the XML header since I have not found a way to have SSMS allow UTF-8)

The stored procedure is:

ALTER   procedure [myschema].[myProc]
  @logXML as xml,
  @logFile as varchar(max),
  @result as varchar(max) OUT

as

begin
  declare @myAction varchar(max);

  begin try
    begin transaction
      set @myAction = 'inserting log file '   @logFile;
      insert into [myDB].[mySchema].[myTable]
      (
        JobName,
        LogName,
        RunDate,
        LogLineNum,
        LogLine
      )
      (select logs.value('JobName[1]', 'varchar(500)') as JobName,
              logs.value('LogName[1]', 'varchar(500)') as LogName,
          logs.value('runDate[1]', 'varchar(20)') as runDate,
          lines.value('Number[1]', 'integer') as LogLineNum,
          lines.value('Content[1]', 'varchar(max)') as LogLine
         from @logXML.nodes('/LogFileContents') as l1(logs),
              @logXML.nodes('/LogFileContents/LogLines/Line') as l2(lines)
      );
    commit transaction;
    set @result = 'SUCCESS';
  end try
  begin catch
  if @@TRANCOUNT > 0
    begin
      rollback
      set @result = 'Error '   @myAction   '.  Error code: '   @@ERROR;
    end
  end catch
end

The XML document generated by the Talend job looks like the following:

<?xml version="1.0" encoding="UTF-8"?> // I do not have control of this - I get this "free" from Talend
<LogFileContents>
  <JobName>myTask</JobName>
  <LogName>myLogFile</LogName>
  <runDate>YYYYMMDDHHmiss</runDate>
  <LogLines>
    <Line>
      <Number>1</Number>
      <Content>1st Log Message</Content>
    </Line>
    <Line>
      <Number>2</Number>
      <Content>2nd Log Message</Content>
    </Line>
    ...
    <Line>
      <Number>Last</Number>
      <Content>Last Log Message</Content>
    </Line>
  </LogLines>
</LogFileContents>

In Talend I have tried passing the XML document to the tDBSP as a document (which results in a Talend Error). It is currently configured to pass the XML document as a String - I had to add the "sendStringParametersAsUnicode=false" to my connection string (additional parameters) to eliminate the "unable to switch encoding" error. When I check the result from the call to the stored proc I get "SUCCESS". I have even tried having the stored proc pass back a non-sense value for the result instead of "SUCCESS".

In the stored proc I have tried changing the input parameter to a varchar(max) and then converting that to an XML variable via

@logXML as varchar(max),
...
declare @myXML xml = convert(xml, @logXML)
...
  from @myXML.nodes(...) as l1(logs)
       @myXML.nodes(...) as l2(lines)

which also did not load the document into the database.

However if I execute the stored procedure via SSMS using the XML Document (only difference is the missing <?xml...> document header):

<LogFileContents>
  <JobName>myTask</JobName>
  <LogName>myLogFile</LogName>
  <runDate>YYYYMMDDHHmiss</runDate>
  <LogLines>
    <Line>
      <Number>1</Number>
      <Content>1st Log Message</Content>
    </Line>
    <Line>
      <Number>2</Number>
      <Content>2nd Log Message</Content>
    </Line>
    ...
    <Line>
      <Number>Last</Number>
      <Content>Last Log Message</Content>
    </Line>
  </LogLines>
</LogFileContents>

The entire contents loads as expected (The log file I am testing with has 40 lines and all 40 show up in the database, exactly as I expect).

I have also reviewed the following links - but they are in regards to processing XML files, not XML strings.

Parsing XML Data Into SQL Server

Importing XML file into SQL Server 2000 using OPENROWSET

I feel like there's something not quite configured right in Talend but I'm not sure where that would be. Any input/guidance would be greatly appreciated.

CodePudding user response:

Your XML shredding has no issues outside of performance.

Please see below.

So, it seems that the issue is some kind of environment related (Talend ?!).

SQL

DECLARE @logXML XML = 
'<?xml version="1.0" encoding="UTF-8"?>
<LogFileContents>
    <JobName>myTask</JobName>
    <LogName>myLogFile</LogName>
    <runDate>YYYYMMDDHHmiss</runDate>
    <LogLines>
        <Line>
            <Number>1</Number>
            <Content>1st Log Message</Content>
        </Line>
        <Line>
            <Number>2</Number>
            <Content>2nd Log Message</Content>
        </Line>...
        <Line>
            <Number>10000</Number>
            <Content>Last Log Message</Content>
        </Line>
    </LogLines>
</LogFileContents>';

SELECT logs.value('(JobName/text())[1]', 'varchar(500)') as JobName
    , logs.value('(LogName/text())[1]', 'varchar(500)') as LogName
    , logs.value('(runDate/text())[1]', 'varchar(20)') as runDate
    , lines.value('(Number/text())[1]', 'integer') as LogLineNum
    , lines.value('(Content/text())[1]', 'varchar(max)') as LogLine
FROM @logXML.nodes('/LogFileContents') as l1(logs)
    , @logXML.nodes('/LogFileContents/LogLines/Line') as l2(lines);

Output

JobName LogName runDate LogLineNum LogLine
myTask myLogFile YYYYMMDDHHmiss 1 1st Log Message
myTask myLogFile YYYYMMDDHHmiss 2 2nd Log Message
myTask myLogFile YYYYMMDDHHmiss 10000 Last Log Message

CodePudding user response:

So it seems that, even though I have a commit in the stored procedure, I still need an external (in Talend) commit (tDBCommit). From what I've been able to gather it has to do with the fact that I use a tDBConnection to establish the DB connection instead of configuring the tDBSP to make a connection to the DB.

Although I haven't tried it, I suspect that I could also set the "Auto Commit" property of the tDBConnection and that would work as well.

  • Related