I would like to know -
- without affecting SQL query performance
- without lowering the memory limit is there any way to improve the impala memory error issue?
I got a few suggestions like changing my join statements in my SQL queries
CodePudding user response:
Impala uses in-memory analytics engine so being minimilastic in every aspect does the trick.
- Filters - Use as many filters as you can. Use subquery and filter inside subquery if you can.
- Joins - Main reason of memory issue - you need to use joins intelligently. As per rule of the thumb, in case of inner join - use the driving table first, then tinyiest table and then next tiny table and so on. For left joins you can use same thumb rule. So, move the tables as per their size (columns and count). Also, use as many filters as you can.
- Operations like
distinct
,regexp
,IN
, concat/function in a join condition or filter can slow things down. Please make sure they are absolutely necessary and there is no way you can avoid them. - Number of columns in select statement, subquery - keep them minimal.
- Operations in select statement, subquery - keep them minimal.
- Partitions - keep them optimized so you have optimum performance. More partition will slow INSERT and less partition will slow down SELECT.
- Statistics - Create a daily plan to gather statistics of all tables and partitions to make things faster.
- Explain Plan - Get the explain plan while the query is running. Query execution give you a unique query link. You will see lots of insights in the operations of the SQL.