Solving ETL Challenges: 5 Command-Line Tools You Need to Know
Author: Gajanan Gaidhane | | May 14, 2020
ETL (extract, transform, load) can be as essential to your business as it can be painful. Without the necessary experience and technical skill set, dealing with ETL issues such as inefficient queries and poor system design can occupy far too much of your time and effort.
The good news: in many cases, you can diagnose and solve some of the most pressing ETL challenges using nothing more than the command-line tools that you already have at your disposal. Below, we’ll go over 5 scripts and programs that can help you immensely in your quest to optimize ETL.
Network latency can be a silent killer for your ETL workflow, causing data transfers to execute unbearably slowly. Although caching may help reduce the amount of data flowing through the network, it can only do so much, given the vast quantities of information taken up by ETL processes.
netcat is a Unix and Microsoft Windows utility that reads from and writes to network connections. The use cases of netcat include diagnosing network latency issues by testing the network’s speed.
First, on the server machine, run the following terminal command:
nc -v -l 2222 > /dev/null
Next, on the client machine, run the dd command for copying files:
dd if=/dev/zero bs=1024K count=512 | nc -v $IP_OF_FIRST_MACHINE 2222
The results of this command will give you the network speed in megabytes per second, e.g.:
536870912 bytes (537 MB) copied, 4.87526 s, 117 MB/s
For the most accurate picture of your network speeds, it’s best to run netcat multiple times and take different types of readings: for example, during the ETL process, or outside normal business hours.
As before, you will need to designate one machine as the server and the other as the client. On the server machine, run the following terminal command:
Then, on the client machine, use iperf to connect to the server (replacing 192.168.1.1 with the server’s IP address):
iperf -c 192.168.1.1
The results of this command will include the network speed under the “Bandwidth” heading, e.g.:
Client connecting to 192.168.1.1, TCP port 5001
TCP window size: 45.0 KByte (default)
[ 3] local 192.168.1.1 port 50616 connected with 192.168.1.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.1 sec 1.27 GBytes 1.08 Gbits/sec
PuTTY is an SSH and Telnet client for Windows, Linux, and macOS that is used to execute remote sessions on a computer over a network. Running PuTTY enables you to use a computer via terminal commands, even if the computer is physically distant from your location.
Users most commonly interact with PuTTY using the graphical interface, which can be easily invoked from the command line:
From here, you can specify the destination to which you want to connect, as well as load previous saved sessions.
The PuTTY client is especially useful for ETL when used with command-line tools such as top, which displays the running processes in Linux. You simply have to execute the terminal command:
PuTTY and top are invaluable for ETL performance monitoring and optimization on remote machines. What’s more, we can use pipes to search through the output of top for the ETL processes that we most care about. For example, the following command returns any processes in top that match the sequence “java”:
top | grep java
With the right search query, you can easily use top to oversee the activity of various Oracle ETL components such as Java, OBIEE, and Informatica.
Last but not least, the df tool displays the amount of disk space available and in use on the current system. Lack of space is one of the biggest ETL challenges that many organizations face. As your ETL processes grow and scale, you’ll need more space for logging, data transformations, and staging areas.
The df command is easily invoked as follows:
This displays the available disk space in bytes. For a more human-readable format, add the -h flag:
The five command-line tools above are essential for anyone looking to get started with ETL performance tuning and optimization. For more complicated issues, however, you may need to seek the help of ETL experts like Datavail.
Want to see how Datavail helps solve our clients’ most pressing ETL challenges? Download our white paper “The Top 5 Challenges of ETL (And How to Solve Them).”
Check out these BI Publisher tips including functions & calculations so you can understand more about the production and support of BI Publisher reports.
Ultimately the goal of commentary in OBIEE is to have a system for persisting feedback, creating a call to action, and recognizing the prolific users.