I have a lot of logfiles that I want to load into a SQL Server database via a Talend job so that I can preserve the logs without keeping the files in the file system (we generate over 3k log files every day). I do not want to change the jobs to write to the database directly as I don't want the jobs to be dependent on the DB server being available.
I have created a Talend job that can read a log file and convert it to an XML document which I can then send to a SQL Server (2016) stored procedure. The stored procedure is executed, however the contents of the log file are NOT being stored in the database.
I have ruled out the following possible causes:
- connectivity/permissions - I have logged in to SSMS with the account used by the Talend job and executed the procedure (minus the XML header since I have not found a way to have SSMS allow UTF-8)
The stored procedure is:
ALTER procedure [myschema].[myProc]
@logXML as xml,
@logFile as varchar(max),
@result as varchar(max) OUT
as
begin
declare @myAction varchar(max);
begin try
begin transaction
set @myAction = 'inserting log file ' @logFile;
insert into [myDB].[mySchema].[myTable]
(
JobName,
LogName,
RunDate,
LogLineNum,
LogLine
)
(select logs.value('JobName[1]', 'varchar(500)') as JobName,
logs.value('LogName[1]', 'varchar(500)') as LogName,
logs.value('runDate[1]', 'varchar(20)') as runDate,
lines.value('Number[1]', 'integer') as LogLineNum,
lines.value('Content[1]', 'varchar(max)') as LogLine
from @logXML.nodes('/LogFileContents') as l1(logs),
@logXML.nodes('/LogFileContents/LogLines/Line') as l2(lines)
);
commit transaction;
set @result = 'SUCCESS';
end try
begin catch
if @@TRANCOUNT > 0
begin
rollback
set @result = 'Error ' @myAction '. Error code: ' @@ERROR;
end
end catch
end
The XML document generated by the Talend job looks like the following:
<?xml version="1.0" encoding="UTF-8"?> // I do not have control of this - I get this "free" from Talend
<LogFileContents>
<JobName>myTask</JobName>
<LogName>myLogFile</LogName>
<runDate>YYYYMMDDHHmiss</runDate>
<LogLines>
<Line>
<Number>1</Number>
<Content>1st Log Message</Content>
</Line>
<Line>
<Number>2</Number>
<Content>2nd Log Message</Content>
</Line>
...
<Line>
<Number>Last</Number>
<Content>Last Log Message</Content>
</Line>
</LogLines>
</LogFileContents>
In Talend I have tried passing the XML document to the tDBSP as a document (which results in a Talend Error). It is currently configured to pass the XML document as a String - I had to add the "sendStringParametersAsUnicode=false" to my connection string (additional parameters) to eliminate the "unable to switch encoding" error. When I check the result from the call to the stored proc I get "SUCCESS". I have even tried having the stored proc pass back a non-sense value for the result instead of "SUCCESS".
In the stored proc I have tried changing the input parameter to a varchar(max) and then converting that to an XML variable via
@logXML as varchar(max),
...
declare @myXML xml = convert(xml, @logXML)
...
from @myXML.nodes(...) as l1(logs)
@myXML.nodes(...) as l2(lines)
which also did not load the document into the database.
However if I execute the stored procedure via SSMS using the XML Document (only difference is the missing <?xml...> document header):
<LogFileContents>
<JobName>myTask</JobName>
<LogName>myLogFile</LogName>
<runDate>YYYYMMDDHHmiss</runDate>
<LogLines>
<Line>
<Number>1</Number>
<Content>1st Log Message</Content>
</Line>
<Line>
<Number>2</Number>
<Content>2nd Log Message</Content>
</Line>
...
<Line>
<Number>Last</Number>
<Content>Last Log Message</Content>
</Line>
</LogLines>
</LogFileContents>
The entire contents loads as expected (The log file I am testing with has 40 lines and all 40 show up in the database, exactly as I expect).
I have also reviewed the following links - but they are in regards to processing XML files, not XML strings.
Parsing XML Data Into SQL Server
Importing XML file into SQL Server 2000 using OPENROWSET
I feel like there's something not quite configured right in Talend but I'm not sure where that would be. Any input/guidance would be greatly appreciated.
CodePudding user response:
Your XML shredding has no issues outside of performance.
Please see below.
So, it seems that the issue is some kind of environment related (Talend ?!).
SQL
DECLARE @logXML XML =
'<?xml version="1.0" encoding="UTF-8"?>
<LogFileContents>
<JobName>myTask</JobName>
<LogName>myLogFile</LogName>
<runDate>YYYYMMDDHHmiss</runDate>
<LogLines>
<Line>
<Number>1</Number>
<Content>1st Log Message</Content>
</Line>
<Line>
<Number>2</Number>
<Content>2nd Log Message</Content>
</Line>...
<Line>
<Number>10000</Number>
<Content>Last Log Message</Content>
</Line>
</LogLines>
</LogFileContents>';
SELECT logs.value('(JobName/text())[1]', 'varchar(500)') as JobName
, logs.value('(LogName/text())[1]', 'varchar(500)') as LogName
, logs.value('(runDate/text())[1]', 'varchar(20)') as runDate
, lines.value('(Number/text())[1]', 'integer') as LogLineNum
, lines.value('(Content/text())[1]', 'varchar(max)') as LogLine
FROM @logXML.nodes('/LogFileContents') as l1(logs)
, @logXML.nodes('/LogFileContents/LogLines/Line') as l2(lines);
Output
JobName | LogName | runDate | LogLineNum | LogLine |
---|---|---|---|---|
myTask | myLogFile | YYYYMMDDHHmiss | 1 | 1st Log Message |
myTask | myLogFile | YYYYMMDDHHmiss | 2 | 2nd Log Message |
myTask | myLogFile | YYYYMMDDHHmiss | 10000 | Last Log Message |
CodePudding user response:
So it seems that, even though I have a commit in the stored procedure, I still need an external (in Talend) commit (tDBCommit). From what I've been able to gather it has to do with the fact that I use a tDBConnection to establish the DB connection instead of configuring the tDBSP to make a connection to the DB.
Although I haven't tried it, I suspect that I could also set the "Auto Commit" property of the tDBConnection and that would work as well.