上海建设银行网站查询余额,网页网站设计培训,网络技术培训机构,开滦建设集团网站在spark sql 中用户可以使用Join hint来建议Spark使用哪一种Join。在Spark 3.0以前#xff0c;只支持BROADCAST这种Join hint。从Spark 3.0开始增加了MERGE、SHUFFLE_HASH和SHUFFLE_REPLICATE_NL这三种Join Hint。优先级为BROADCAST MERGE SHUFFLE_HASH SHUFFL…在spark sql 中用户可以使用Join hint来建议Spark使用哪一种Join。在Spark 3.0以前只支持BROADCAST这种Join hint。从Spark 3.0开始增加了MERGE、SHUFFLE_HASH和SHUFFLE_REPLICATE_NL这三种Join Hint。优先级为BROADCAST MERGE SHUFFLE_HASH SHUFFLE_REPLICATE_NL。如果Join的两侧都添加了BROADCAST或者SHUFFLE_HASH则Spark会根据joinType和两侧的大小来选择build哪一侧。
-- Join Hints for broadcast join
SELECT /* BROADCAST(t1) */ * FROM t1 INNER JOIN t2 ON t1.key t2.key;
SELECT /* BROADCASTJOIN (t1) */ * FROM t1 left JOIN t2 ON t1.key t2.key;
SELECT /* MAPJOIN(t2) */ * FROM t1 right JOIN t2 ON t1.key t2.key;-- Join Hints for shuffle sort merge join
SELECT /* SHUFFLE_MERGE(t1) */ * FROM t1 INNER JOIN t2 ON t1.key t2.key;
SELECT /* MERGEJOIN(t2) */ * FROM t1 INNER JOIN t2 ON t1.key t2.key;
SELECT /* MERGE(t1) */ * FROM t1 INNER JOIN t2 ON t1.key t2.key;-- Join Hints for shuffle hash join
SELECT /* SHUFFLE_HASH(t1) */ * FROM t1 INNER JOIN t2 ON t1.key t2.key;-- Join Hints for shuffle-and-replicate nested loop join
SELECT /* SHUFFLE_REPLICATE_NL(t1) */ * FROM t1 INNER JOIN t2 ON t1.key t2.key;-- When different join strategy hints are specified on both sides of a join, Spark
-- prioritizes the BROADCAST hint over the MERGE hint over the SHUFFLE_HASH hint
-- over the SHUFFLE_REPLICATE_NL hint.
-- Spark will issue Warning in the following example
-- org.apache.spark.sql.catalyst.analysis.HintErrorLogger: Hint (strategymerge)
-- is overridden by another hint and will not take effect.
SELECT /* BROADCAST(t1), MERGE(t1, t2) */ * FROM t1 INNER JOIN t2 ON t1.key t2.key;
spark hint 中使用关系https://blog.51cto.com/u_15435003/5296344