Hive的left join,left outer join和lefthive semi joinn三者的区别

sql - Difference between INNER JOIN and LEFT SEMI JOIN - Stack Overflow
to customize your list.
Stack Overflow is a community of 4.7 million programmers, just like you, helping each other.
J it only takes a minute:
Join the Stack Overflow community to:
Ask programming questions
Answer and help your peers
Get recognized for your expertise
What is the difference between an INNER JOIN and LEFT SEMI JOIN?
In the scenario below, why am I getting two different results?
The INNER JOIN result set is a lot larger. Can someone explain? I am trying to get the names within table_1 that only appear in table_2.
SELECT name
FROM table_1 a
INNER JOIN table_2 b ON a.name=b.name
SELECT name
FROM table_1 a
LEFT SEMI JOIN table_2 b ON (a.name=b.name)
7,92762854
An INNER JOIN returns the columns from both tables.
A LEFT SEMI JOIN only returns the records from the left-hand table.
It's equivalent to (in standard SQL):
SELECT name
FROM table_1 a
WHERE EXISTS(
SELECT * FROM table_2 b WHERE (a.name=b.name))
If there are multiple matching rows in the right-hand column, an INNER JOIN will return one row for each matching column, while a LEFT SEMI JOIN only returns the rows from the left table.
That's why you're seeing a different number of rows in your result.
I am trying to get the names within table_1 that only appear in table_2.
Then a LEFT SEMI JOIN is the appropriate query to use.
90.5k756103
Your Answer
Sign up or
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Post as a guest
By posting your answer, you agree to the
Not the answer you're looking for?
Browse other questions tagged
Stack Overflow works best with JavaScript enabledHive在左外关联(LEFT&OUTER&JOIN)上的一个重大缺陷
测试版本为Hive
新浪微博&@谢纯青 回复:老bug了,0.7版本已修复,&id 1534
------------------------------------------------------------------------------
左外关联(或者右外关联、全外关联)最重要的特性与原则就是主表的数据不能丢,例如有ljn001,ljn002两个表,都有a,b两个字段:
hive& select * from ljn001 order by a,b;
hive& select * from ljn002 order by a,b;
3那么以下这个左外关联查询应该得到以下的结果:
hive& select a.*,b.*
&&& from ljn001
left outer join ljn002
& & & on (ljn001.a
= ljn002.a and ljn001.b = ljn002.b);
a&&&&&&&b&&&&&&&a&&&&&&
1&&&&&&&1&&&&&&&1&&&&&&
1&&&&&&&2&&&&&&&1&&&&&&
1&&&&&&&3&&&&&&&1&&&&&&
2&&&&&&&3&&&&&&&NULL&&&
2&&&&&&&4&&&&&&&NULL&&&
NULL但是接下来我将展示目前版本的Hive在外关联上的一个重要缺陷 :
select ljn001.*,ljn002.*
from ljn001 left outer join ljn002
on (ljn001.a = ljn002.a and ljn001.b = ljn002.b and ljn001.b =
2);此SQL较前面的SQL多了红色标识的条件。根据外关联的原则:主表数据不能丢失,即ljn001表的数据不能丢失,当on里面的条件为真时ljn002的数据进行关联,当on里面的条件为假时ljn002的列全部为NULL。因此这个SQL的结果应该是:
a&&&&&&&b&&&&&&&a&&&&&&
1&&&&&&&1&&&&&&&NULL&&&
1&&&&&&&2&&&&&&&1&&&&&&
1&&&&&&&3&&&&&&&NULL&&&
2&&&&&&&3&&&&&&&NULL&&&
2&&&&&&&4&&&&&&&NULL&&&
NULL可以在Oracle中验证此SQL的结果。但是在Hive中结果却如下:
hive& select ljn001.*,ljn002.*
&&& from ljn001
left outer join ljn002
& & & on (ljn001.a
= ljn002.a and ljn001.b = ljn002.b and ljn001.b = 2);
a&&&&&&&b&&&&&&&a&&&&&&
1&&&&&&&2&&&&&&&1&&&&&&
2从执行计划中可以看出Hive在扫描ljn001表的map操作时就已经对b =
2进行了过滤。可见Hive把ljn001.b =
2当成了一个where筛选条件而不是一个on关联条件。因此在做Hive开发时一定要注意这个问题,否则就会产生意想不到的数据错误,也希望Hive能尽快修复这个缺陷。&
已投稿到:
以上网友发言只代表其个人观点,不代表新浪网的观点或立场。left join 和 left outer join 的区别_百度知道
left join 和 left outer join 的区别
以下为两个测试数据表建表语句:
DROP TABLE IF EXISTS table1;
create table table1(
student_no
comment '学号',
student_name
comment '姓名'
COMMENT 'test 学生信息'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
DROP TABLE IF EXISTS table2;
create table table2(
student_no
comment '学号',
comment '课程号'
COMMENT 'test 学生选课信息'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
load data local inpath 'data_table1.txt' overwrite into table table1;
load data local inpath 'data_table2.txt' overwrite into table table2;
测试数据为:
hive left join测试数据 测试1:left join 语句: select * from table1 left outer join table2 on(table1.student_no=table2.student_no); 结果: FAILED: Parse Error: line 1:22 cannot recognize input near ‘left’ ‘joi哗鼎糕刮蕹钙革水宫惊n’ ‘table2′ in join type specifier 我用的HIVE版本是0.8,不支持直接的left join写法; 测试2:left outer join 语句: select * from table1 left outer join table2 on(table1.student_no=table2.student_no); 结果: 1 name1 1 11 1 name1 1 12 1 name1 1 13 2 name2 2 11 2 name2 2 14 3 name3 3 15 3 name3 3 12 4 name4 4 13 4 name4 4 12 5 name5 5 14 5 name5 5 16 6 name6 NULL NULL 可以看到left outer join左边表的数据都列出来了,如果右边表没有对应的列,则写成了NULL值。 同时注意到,如果左边的主键在右边找到了N条,那么结果也是会叉乘得到N条的,比如这里主键为1的显示了右边的3条。 测试3:left semi join 语句: select * from table1 left semi join table2 on(table1.student_no=table2.student_no); 结果: 1 name1 2 name2 3 name3 4 name4 5 name5 可以看到,只打印出了左边的表中的列,规律是如果主键在右边表中存在,则打印,否则过滤掉了。 结论:
hive不支持’left join’的写法;
hive的left outer join:如果右边有多行和左边表对应,就每一行都映射输出;如果右边没有行与左边行对应,就输出左边行,右边表字段为NULL;
hive的left semi join:相当于SQL的in语句,比如上面测试3的语句相当于“select * from table1 where table1.student_no in (table2.student_no)”,注意,结果中是没有B表的字段的。
知道智能回答机器人
我是知道站内的人工智能,可高效智能地为您解答问题。很高兴为您服务。
其他类似问题
为您推荐:
等待您来回答
下载知道APP
随时随地咨询
出门在外也不愁Hive的left join,left outer join和left semi join三者的区别_百度知道
Hive的left join,left outer join和left semi join三者的区别
提问者采纳
student_no=table2;hive的left outer join:1 name1 1 111 name1 1 121 name1 1 132 name2 2 112 name2 2 143 name3 3 153 name3 3 124 name4 4 134 name4 4 125 name5 5 145 name5 5 166 name6 NULL NULL可以看到left outer join左边表的数据都列出来了; ROW FORMAT DELIMITED FIELDS TERMINATED BY '结果; overwrite into table table1以下为两个测试数据表建表语句,结果中是没有B表的字段的;结果,就输出左边行:22 cannot recognize input near ‘left’ ‘join’ ‘table2′ in join type specifier我用的HIVE版本是0:left semi join语句;hive的left semi join.student_no)”;\n't&#39,那么结果也是会叉乘得到N条的;姓名&#39,右边表字段为NULL.8,注意:select * from table1 left semi join table2 on(table1;t&#39,如果左边的主键在右边找到了N条;;学号&#39.student_no in (table2; load data local inpath &#39.student_no):select * from table1 left outer join table2 on(table1.txt&#39:hive不支持’left join’的写法。测试3;&#92,规律是如果主键在右边表中存在; overwrite into table table2,不支持直接的left join写法;test 学生信息' STORED AS TEXTFILE。同时注意到:left outer join语句,
student_name
comment &#39,只打印出了左边的表中的列。结论:相当于SQL的in语句;如果右边没有行与左边行对应:hive left join测试数据测试1,则写成了NULL值:如果右边有多行和左边表对应;课程号'&#92.student_no);test 学生选课信息&#39,
comment ' STORED AS TEXTFILE; DROP TABLE IF EXISTS table2:select * from table1 left outer join table2 on(table1.txt'n&#39.student_no=table2; ROW FORMAT DELIMITED FIELDS TERMINATED BY &#39,比如这里主键为1的显示了右边的3条;测试2,就每一行都映射输出; create table table2(
student_no
comment 'data_table2; ) COMMENT &#39.student_no),否则过滤掉了:left join语句;结果: line 1;data_table1; ) COMMENT &#39,比如上面测试3的语句相当于“select * from table1 where table1:FAILED;测试数据为:1 name12 name23 name34 name45 name5可以看到,则打印;\ DROP TABLE IF EXISTS table1;: Parse E学号&#39.student_no=table2: LINES TERMINATED BY &#39,如果右边表没有对应的列; LINES TERMINATED BY ' load data local inpath ' create table table1(
student_no
comment &#39
其他类似问题
为您推荐:
semi的相关知识
等待您来回答
下载知道APP
随时随地咨询
出门在外也不愁Hive的left join,left outer join和left semi join三者的区别_百度知道
Hive的left join,left outer join和left semi join三者的区别
提问者采纳
那么结果也是会叉乘得到N条的.student_no),不支持直接的left join写法.txt' ROW FORMAT DELIMITED FIELDS TERMINATED BY 'n&#39.student_no=table2:1 name1 1 111 name1 1 121 name1 1 132 name2 2 112 name2 2 143 name3 3 153 name3 3 124 name4 4 134 name4 4 125 name5 5 145 name5 5 166 name6 NULL NULL可以看到left outer join左边表的数据都列出来了以下为两个测试数据表建表语句; overwrite into table table1:select * from table1 left outer join table2 on(table1:hive left join测试数据测试1; STORED AS TEXTFILE:left join语句:left outer join语句; DROP TABLE IF EXISTS table1:22 cannot recognize input near ‘left’ ‘join’ ‘table2′ in join type specifier我用的HIVE版本是0;测试2; create table table1(
student_no
comment &#39,如果左边的主键在右边找到了N条;data_table1;测试数据为; STORED AS TEXTFILE: Parse E\ LINES TERMINATED BY '学号't&#39: line 1.8,
comment ' load data local inpath '.student_no);\&#92,
student_name
comment ' ROW FORMAT DELIMITED FIELDS TERMINATED BY '结果;test 学生选课信息&#39.student_no=table2.txt'姓名' LINES TERMINATED BY 't'; ) COMMENT &#39:select * from table1 left outer join table2 on(table1;结果;test 学生信息'\ DROP TABLE IF EXISTS table2;data_table2; load data local inpath ' create table table2(
student_no
comment ' ) COMMENT '学号&#39:FAILED;课程号&#39:use test,则写成了NULL值。同时注意到,比如这里主键为1的显示了右边的3条;n' overwrite into table table2,如果右边表没有对应的列
来自团队:
其他类似问题
为您推荐:
semi的相关知识
等待您来回答
下载知道APP
随时随地咨询
出门在外也不愁}

我要回帖

更多关于 hive outer join 的文章

更多推荐

版权声明:文章内容来源于网络,版权归原作者所有,如有侵权请点击这里与我们联系,我们将及时删除。

点击添加站长微信