Monday 10 December 2018


SQL Execution Via-SparkThriftServer throws “Error: java.lang.StackOverflowError


Overview 

This article describes "Error: java.lang.StackOverflowError (state=,code=0)” seen while executing very long SQL in Via-SPARK connection. Issue and resolution were tested on Ambari managed 3 node Hortonworks 2.6.5.0 cluster.

The test query contains >=133 number of predicates in where clause. For such query with default configuration java.lang.StackOverflowError
is seen.



From Spark log file  we see following error messages.



When we use Simba Driver for Spark , the console error seen might be different.

Root Cause


It can be noticed from log messages that default stack size of JVM is allowing to push 132 calls to nodeToExpr I.e. 132 predicates. Hence, when 133rd predicate is added to where clause of query , java.lang.StackOverflowError was seen.

Resolution


We need to increase StackSize by using Xss option of java. To support query containing many more predicates in where clause we can set stack size to large value say 1500m.


We may add following line at end of spark-env.sh file (or increase value if such statement already exist).



After saving spark-env.sh , restart Spark Thrift Server, Spark History Server and refresh Spark client configs. Now same test query will runs successfully.
 


Monday 26 November 2018

Greenplum "ERROR: target lists can have at most 1664 entries"

Overview 

This article describes "ERROR: target list can have at most 1664 enties". Experiment described in article was tested on Greenplum 5.5.

Greenplum has limitation on number of columns that can be mentioned in projection list of select query. There can be maximum 1664 column in projection list of SELECT query. Hence, while executing SELECT query containing >1664 columns will lead to such error.

Consider table having number of columns > 832. We can also hit "ERROR: target list can have at most 1664 enties" error while inserting row/s with any number of columns into such table.










Table Statistics Collection 

Greenplum provides feature of analyzing tables and collecting statistics. These statistics can be used for generating cost optimised query plan. The ANALYZE statement which can be used by end user to trigger table analysis manually. There exist automatic statistics collection functionality in greenplum due to which table is ANALYZed and Statistics are collected whenever rows are inserted in table. The value of gp_autostat_mode configuration parameter decides wether automatic statistics should be triggered or not.

While collecting table statistics, analyzer executes some SELECT statements to extract stored data size. That means SELECTs projecting all table columns are executed. One of the statistics collection SELECT projects each column twice (after applying certain functions).

Following is example of statistics gathering select under discussion for table named tab1 with single column id1. 


Considering Greenplum's limitation of 1664 columns in projection list , it is obvious to hit "ERROR: target list can have at most 1664 enties" while  inserting row/s into table having >832 columns.

References 

  1. https://github.com/greenplum-db/gpdb/issues/1345
  2. https://gpdb.docs.pivotal.io/550/admin_guide/topics/g-automatic-statistics-collection-parameters.html
  3. https://gpdb.docs.pivotal.io/550/ref_guide/sql_commands/ANALYZE.html