高校网站建设,手机自己免费制作app软件,电商网站建设培训,网站 被刷流量前言
Pycharm配置了SSH服务器和Anaconda的python解释器#xff0c;如果没有配置可参考 大数据单机学习环境搭建(8)Linux单节点Anaconda安装和Pycharm连接
Pycharm执行的脚本
执行如下 pyspark_model.py 的python脚本#xff0c;构建SparkSession来执行sparksql
如果没有配置可参考 大数据单机学习环境搭建(8)Linux单节点Anaconda安装和Pycharm连接
Pycharm执行的脚本
执行如下 pyspark_model.py 的python脚本构建SparkSession来执行sparksql
脚本名称Pycharm使用pyspark测试功能Pycharm远程执行sparksqlfrom pyspark.sql import SparkSession
import osos.environ[SPARK_HOME] /opt/spark
os.environ[JAVA_HOME] /opt/jdk1.8spark SparkSession.builder \.appName(pyspark_conda) \.master(yarn) \.config(spark.sql.warehouse.dir, hdfs://bigdata01:8020/user/hive/warehouse) \.config(hive.metastore.uris, thrift://bigdata01:9083) \.enableHiveSupport() \.getOrCreate()spark.sql(select * from hostnames limit 10;).show()spark.stop()报错一pyspark版本不匹配
例如我当前集群环境Spark3.0.0python的pyspark3.5.0没有指定版本默认下载了最新的
报错信息 [JAVA_GATEWAY_EXITED] Java gateway process exited before sending its port number., 具体如下
ssh://slashbigdata01:22/opt/python3/bin/python3 -u /home/slash/etl/dwtool/pyspark/pyspark_script/pyspark_model.py
JAVA_HOME is not set
Traceback (most recent call last):File /home/slash/etl/dwtool/pyspark/pyspark_script/pyspark_model.py, line 7, in modulespark SparkSession.builder \File /opt/python3/lib/python3.8/site-packages/pyspark/sql/session.py, line 497, in getOrCreatesc SparkContext.getOrCreate(sparkConf)File /opt/python3/lib/python3.8/site-packages/pyspark/context.py, line 515, in getOrCreateSparkContext(confconf or SparkConf())File /opt/python3/lib/python3.8/site-packages/pyspark/context.py, line 201, in __init__SparkContext._ensure_initialized(self, gatewaygateway, confconf)File /opt/python3/lib/python3.8/site-packages/pyspark/context.py, line 436, in _ensure_initializedSparkContext._gateway gateway or launch_gateway(conf)File /opt/python3/lib/python3.8/site-packages/pyspark/java_gateway.py, line 107, in launch_gatewayraise PySparkRuntimeError(
pyspark.errors.exceptions.base.PySparkRuntimeError: [JAVA_GATEWAY_EXITED] Java gateway process exited before sending its port number.如果坚持不更换python的pyspark版本即使像报错2已经指定了JAVA_HOME 依然会有其他报错。例如下方报错 Py4JError 所以最彻底的方法是替换pyspark版本与spark版本一致
Traceback (most recent call last):File /home/slash/etl/dwtool/pyspark/pyspark_script/pyspark_model.py, line 7, in modulespark SparkSession.builder \File /opt/python3/lib/python3.8/site-packages/pyspark/sql/session.py, line 497, in getOrCreatesc SparkContext.getOrCreate(sparkConf)File /opt/python3/lib/python3.8/site-packages/pyspark/context.py, line 515, in getOrCreateSparkContext(confconf or SparkConf())File /opt/python3/lib/python3.8/site-packages/pyspark/context.py, line 203, in __init__self._do_init(File /opt/python3/lib/python3.8/site-packages/pyspark/context.py, line 316, in _do_initself._jvm.PythonUtils.getPythonAuthSocketTimeout(self._jsc)File /opt/python3/lib/python3.8/site-packages/py4j/java_gateway.py, line 1549, in __getattr__raise Py4JError(
py4j.protocol.Py4JError: org.apache.spark.api.python.PythonUtils.getPythonAuthSocketTimeout does not exist in the JVM报错二JAVA_HOME指定不成功
python的pyspark已经重装3.0.0版本下载时指定版本 pip install pyspark3.0.0报错信息 Java gateway process exited before sending its port number., JAVA_HOME is not set 具体如下
ssh://slashbigdata01:22/opt/python3/bin/python3 -u /home/slash/etl/dwtool/pyspark/pyspark_script/pyspark_model.py
JAVA_HOME is not set
Traceback (most recent call last):File /home/slash/etl/dwtool/pyspark/pyspark_script/pyspark_model.py, line 7, in modulespark SparkSession.builder \File /opt/python3/lib/python3.8/site-packages/pyspark/sql/session.py, line 186, in getOrCreatesc SparkContext.getOrCreate(sparkConf)File /opt/python3/lib/python3.8/site-packages/pyspark/context.py, line 371, in getOrCreateSparkContext(confconf or SparkConf())File /opt/python3/lib/python3.8/site-packages/pyspark/context.py, line 128, in __init__SparkContext._ensure_initialized(self, gatewaygateway, confconf)File /opt/python3/lib/python3.8/site-packages/pyspark/context.py, line 320, in _ensure_initializedSparkContext._gateway gateway or launch_gateway(conf)File /opt/python3/lib/python3.8/site-packages/pyspark/java_gateway.py, line 105, in launch_gatewayraise Exception(Java gateway process exited before sending its port number)
Exception: Java gateway process exited before sending its port number指定内容如下
# pyspark3.5.0指定了 SPARK_HOME JAVA_HOME还是会报错
# pyspark3.0.0指定后成功运行
os.environ[SPARK_HOME] /opt/spark
os.environ[JAVA_HOME] /opt/jdk1.8报错三python版本问题
最开始安装的最新版的anaconda环境其中python3.11安装pyspark3.0.0也会报错 TypeError: code() argument 13 must be str, not int具体内容如下
ssh://slashbigdata01:22/opt/anaconda3/bin/python3.11 -u /home/slash/etl/dwtool/pyspark/pyspark_script/pyspark_model.py
Traceback (most recent call last):File /home/slash/etl/dwtool/pyspark/pyspark_script/pyspark_model.py, line 1, in modulefrom pyspark.sql import SparkSessionFile /opt/anaconda3/lib/python3.11/site-packages/pyspark/__init__.py, line 51, in modulefrom pyspark.context import SparkContextFile /opt/anaconda3/lib/python3.11/site-packages/pyspark/context.py, line 30, in modulefrom pyspark import accumulatorsFile /opt/anaconda3/lib/python3.11/site-packages/pyspark/accumulators.py, line 97, in modulefrom pyspark.serializers import read_int, PickleSerializerFile /opt/anaconda3/lib/python3.11/site-packages/pyspark/serializers.py, line 71, in modulefrom pyspark import cloudpickleFile /opt/anaconda3/lib/python3.11/site-packages/pyspark/cloudpickle.py, line 209, in module_cell_set_template_code _make_cell_set_template_code()^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File /opt/anaconda3/lib/python3.11/site-packages/pyspark/cloudpickle.py, line 172, in _make_cell_set_template_codereturn types.CodeType(^^^^^^^^^^^^^^^
TypeError: code() argument 13 must be str, not int删除 /opt/anaconda3的文件夹后重新安装了 Anaconda3-2021.05-Linux-x86_64.sh 版本的anaconda其中python3.8利用pyspark3.0.0第三方库操作spark3.0.0的计算引擎构建SparkSession执行sparksql成功。 声明本文所载信息不保证准确性和完整性。文中所述内容和意见仅供参考不构成实际商业建议如有雷同纯属巧合。