Home > database >  Kettle parameter data
Kettle parameter data

Time:10-09

Kettle parameter
Parameter is one of ETL tools use must understand a problem, at present there are two ways: Kettle and parameter

1, the Arguments
Arg by ginseng is 3.2 the following version USES the original way, through the command line parameters, and then using the Get System Info component receives the incoming parameters, a total of ten parameters can be transmitted, this parameter methods are rarely used,



(figure 1)

2, the Parameters
Param parameters only need to know the name and then pay value through the parameter name, parameters are case sensitive,

Kitchen. Bat (sh) - the file=test. KJB - PARAM: STARTDATE=VALUE



3, how to use
There are two ways for the use of parameters: [VAR % % % %, ${VAR}]

Differ as a result of the two parameters using method is needed to see the ETL program deployment environment, the use of VAR % % % % environment is Windows, and ${VAR} use environment is Linux, but consider the program transplantation generally use ${VAR}, because ${VAR} ways to identify in a Windows environment,

4, dynamic parameter Settings
Dynamic parameters can be set in the Job and Trans, and if it is set in the Trans is the next step is needed to use, if used in the current Trans will go wrong,



(figure 2)

5, global parameters Settings
Global parameters is set up in the kettle. The properties file configuration, the configuration way is key=value way,

File path is C: \ Documents and Settings \ \ [UserName] kettle \

Script to obtain parameters syntax: parent_job getVariable (" param_name ");

6, parameter priority
Parameters of priority is also a need to be concerned with a problem, especially when the global parameters also have value, and dynamically setting the parameters, or the parameters defined in the Job, then we need to know the parameters of priority, the following is the parameters of a priority sorting,

The Job itself is a dynamic parameters a global parameters

Set the parameters need to be aware of is that when the Job itself, and also has the global parameters, if the Job itself is empty and the global parameters is not null, then the output parameter is null,



? Script Kettle directory
When you put the Kettle after decompression will find many scripts document has bat, sh file, below for these scripts and use cover to let everyone know what is the use these scripts,

The script name

Script description

The script USES the

Spoon. Bat

Start the Kettle one way, also can use the Kettle. Exe start

Double-click the script is running, as we may in performing the Job or Trans when this memory can be modified by the script when the JVM size

Pan. Bat

This script is used to run the test. The KTR file, which is Trans file

Write a bat file to invoke this script run Trans file, the specific method of use please refer to the Pan to use

Kitchen. Bat

This script is used to run the test. KJB file, that is, the Job file

Write a bat file to invoke this script run Job file, the specific method of use please refer to the Kitchen to use

Carte. Bat

This script is to use the cluster Kettle, when the data volume reached a certain level will need to consider the cluster function to share the pressure of the server, and other aspects of performance,

Carte usage

Encr. Bat

This script is used to connect a database password encryption and password is used when a cluster

Encr usage

Run_kettle_cluster_example. Bat

This script is an example of a cluster Kettle



Note: the above said is bat script file and sh script file did not say, because the use of sh method and the use of the bat is basically consistent so here no longer detail,



1, the Pan to use
We can use the CMD to see Pan can receive which parameters are as follows:



(figure 3)

Pan example:

Pan. Bat/file=test. KTR/logfile=test. TXT/param=STARTDATE=2011-04-19



2, the Kitchen to use
We can use the CMD to see Pan can receive which parameters are as follows:



(figure 4)

Kitchen example:

Kitchen. Bat/file=test. KTR/logfile=test. TXT/param=STARTDATE=2011-04-19



3, Carte usage


(FIG. 5)

From the figure can be out of the specific way of using Carte,

Carte example:

Carte. Bat 127.0.0.1 8080

Because there are a main cluster in the cluster, so need to start the main cluster and then start a subset of the other group,



4, Encr usage


(figure 6)

Diagram on how to use encr there are two kinds of encryption USES a is the password to connect to the database is a cluster of password

Encr example:

Encr. Bat - kettle test

Encr. Bat - carte test



(figure 7)



? Kettle component description
Due to the Kettle components is too much, I won't detail here, you can visit the following url to understand learning,

http://wiki.pentaho.com/display/EAI/Pentaho+Data+Integration+Steps

Kettle or reference user manual. PDF



? Kettle optimization
Said to optimize I believe everybody likes to ask this question is also one of the most concerned problems, in this to provide the main several optimization

1, according to the size of the data set commit size

2, according to the size of the data set data fetching result set size



(figure 8)

3, as far as possible the JVM Settings bigger

4, enable the database connection pool



(figure 9)

5, the optimization of SQL itself

6, need to pay attention to some details, such as in the use of insert update, update, delete components, such as when a primary key table has to need to pay attention to the order of the index, if the order is not correct lead to don't walk index,



? Kettle concurrent
For ETL tools concurrency is an indispensable part, because and will lead to efficiency,



(figure 10)

Concurrent methods:

Where when you want to step in when concurrent, right-click on the Job or Trans to Launch next... On the parallel hook,



(figure 11)

For concurrent notable is the database connection, when using concurrent functions need to enable function of database connection pool, for details please refer to the Kettle connection pool configuration

? Specified manually. Kettle directory
Add the environment variable KETTLE_HOME=[specify. Kettle directory], because you need to specify the test environment is xp after restart the computer to take effect,



Reprinted source: http://blog.sina.com.cn/s/blog_b82e70870101f4nq.html

  • Related