数据
id,username,math,computer,english
1,huangbo,34,58,58
2,xuzheng,45,87,45
3,wangbaoqiang,76,34,89
编写 SQL,将其转换为下面这张表
id,username,course,score
1,huangbo,math,34
1,huangbo,computer,58
1,huangbo,english,58
2,xuzheng,math,45
2,xuzheng,computer,87
2,xuzheng,english,45
3,wangbaoqiang,math,76
3,wangbaoqiang,computer,34
3,wangbaoqiang,english,89
第一种写法
基本步骤:
- 写出各学科的科目以及对应的分数
- 用union进行合并
SQL代码:
with t1 as(select id,username,'math' as subject,math score from zuoye3union select id,username,'computer' as subject,computer score from zuoye3unionselect id,username,'english' as subject,english score from zuoye3)select * from t1 order by id
sparkSQL代码:
import os
import refrom pyspark.sql import SparkSession"""
------------------------------------------Description : TODO:SourceFile : _04-zuoye1Author : zxxDate : 2024/11/4
-------------------------------------------
"""
# 查询每种爱好中年龄最大的人,如果有相同的年龄,并列显示。
if __name__ == '__main__':os.environ['JAVA_HOME'] = 'D:/bigdata/03-java/java-8/jdk'# 配置Hadoop的路径,就是前面解压的那个路径os.environ['HADOOP_HOME'] = 'D:/bigdata/04-Hadoop/hadoop/hadoop-3.3.1/hadoop-3.3.1'# 配置base环境Python解析器的路径os.environ['PYSPARK_PYTHON'] = 'D:/bigdata/22-spark/Miniconda3/python.exe' # 配置base环境Python解析器的路径os.environ['PYSPARK_DRIVER_PYTHON'] = 'D:/bigdata/22-spark/Miniconda3/python.exe'spark = SparkSession.builder.master("local[2]").appName("zuoye3").config("spark.sql.shuffle.partitions", 2).getOrCreate()df = (spark.sparkContext.textFile("../../datas/zuoye2/zuoye3.txt").filter(lambda x:re.split(",",x)[0] !="id").map(lambda line: (re.split(",",line)[0],re.split(",",line)[1],re.split(",",line)[2],re.split(",",line)[3],re.split(",",line)[4])).toDF(["id","username","math","computer","english"]))df.createOrReplaceTempView("zuoye3")spark.sql("""with t1 as(select id,username,'math' as subject,math score from zuoye3union select id,username,'computer' as subject,computer score from zuoye3unionselect id,username,'english' as subject,english score from zuoye3)select * from t1 order by id""").show()# 关闭spark.stop()
运行结果:
第二种写法
基本步骤
- 把学科名字和其对应分数拼接
- 把所有拼接的结果拼成一个大的字符串,用炸裂函数炸开
- 把学科名字和其对应分数拼接用split分割成两列
SQL代码
with t1 as(select id,username,concat_ws(',',concat('math:',math),concat('computer:',computer),concat('english:',english)) s from zuoye3),t2 as(select id,username,score from t1 lateral view explode(split(s,',')) t1 as score)select id,username,split(score,':')[0] subject,split(score,':')[1] score from t2
sparkSQL代码
import os
import refrom pyspark.sql import SparkSession"""
------------------------------------------Description : TODO:SourceFile : _04-zuoye1Author : zxxDate : 2024/11/4
-------------------------------------------
"""
# 查询每种爱好中年龄最大的人,如果有相同的年龄,并列显示。
if __name__ == '__main__':os.environ['JAVA_HOME'] = 'D:/bigdata/03-java/java-8/jdk'# 配置Hadoop的路径,就是前面解压的那个路径os.environ['HADOOP_HOME'] = 'D:/bigdata/04-Hadoop/hadoop/hadoop-3.3.1/hadoop-3.3.1'# 配置base环境Python解析器的路径os.environ['PYSPARK_PYTHON'] = 'D:/bigdata/22-spark/Miniconda3/python.exe' # 配置base环境Python解析器的路径os.environ['PYSPARK_DRIVER_PYTHON'] = 'D:/bigdata/22-spark/Miniconda3/python.exe'spark = SparkSession.builder.master("local[2]").appName("zuoye3").config("spark.sql.shuffle.partitions", 2).getOrCreate()df = (spark.sparkContext.textFile("../../datas/zuoye2/zuoye3.txt").filter(lambda x:re.split(",",x)[0] !="id").map(lambda line: (re.split(",",line)[0],re.split(",",line)[1],re.split(",",line)[2],re.split(",",line)[3],re.split(",",line)[4])).toDF(["id","username","math","computer","english"]))df.createOrReplaceTempView("zuoye3")spark.sql("""with t1 as(select id,username,concat_ws(',',concat('math:',math),concat('computer:',computer),concat('english:',english)) s from zuoye3),t2 as(select id,username,score from t1 lateral view explode(split(s,',')) t1 as score)select id,username,split(score,':')[0] subject,split(score,':')[1] score from t2""").show()# 关闭spark.stop()