Home > OS >  Shell script to pull row counts from all Hive tables in multiple Hive databases
Shell script to pull row counts from all Hive tables in multiple Hive databases

Time:11-23

I am trying to create a shell script that will pull row counts in all tables from multiple databases. All of the databases follow the same naming convention "the_same_databasename_<%>" except the final layer in the name, which varies. I am trying to run the following:

use <database_name>; show tables; select count(*) from <table_name>;

Since I have 40 different databases, I would need to run the first two queries for each database 40 different times, plus the select count query even more depending on how many table in the database (very time consuming). I have my PuTTy configuration settings set to save my PuTTy sessions into a .txt on my local directory, so I can have the row count results displayed right in my command line interface. So far this is what I have but not sure how to include the final commands to get the actual row counts from the tables in each database.

#!/bin/bash
for db in $(hive -e "show databases like 'the_same_databasename_*;") 
do
   tbl_count=$(hive -S -e "use $db; show tables;" | wc -l)
   echo "Database $db contains $tbl_count tables."
  
done    

I'm not very experienced in shell scripting so any guidance/help is greatly appreciated. Thanks in advance.

CodePudding user response:

You can use nested for-loop:

#!/bin/bash
for db in $(hive -e "show databases like 'the_same_databasename_*;") 
do
   tbl_count=$(hive -S -e "use $db; show tables;" | wc -l)
   echo "Database $db contains $tbl_count tables."

   for table in $(hive -S -e "use $db; show tables;")
   do
      count=$(hive -S -e "use $db; select count(*) from $table;")
      echo "Table $db.$table contains $count rows."
   done
   
done

Or you can use variable to increment count of tables

#!/bin/bash
for db in $(hive -e "show databases like 'the_same_databasename_*;") 
do

   tbl_count=0
   for table in $(hive -S -e "use $db; show tables;")
   do
      (( tbl_count   ))
      count=$(hive -S -e "use $db; select count(*) from $table;")
      echo "Table $db.$table contains $count rows."
   done
   echo "Database $db contains $tbl_count tables."
   
done
  • Related