找到你要的答案

Q:Setting spark memory allocations for extracting 125 Gb of data…ExecutorLostFailure

Q:提取125 GB的数据…executorlostfailure设置火花内存分配

I'm trying to pull a 126 Gb table out of HAWQ (PostgreSQL, in this case 8.2) into Spark and it is not working. I can pull smaller tables no problem. For this one I keep getting the error:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): ExecutorLostFailure (executor driver lost)
Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
    at scala.Option.foreach(Option.scala:236)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

My cluster specifications are as follows: 64 cores, 512 Gb of RAM, 2 nodes
This is a Spark standalone cluster on the 2 nodes (trust me, I'd like more nodes, but that's all I get). So I have one node as a pure slave, and the other node houses both the master and the other slave.

I've tried many configurations of memory allocations with the spark-submit job, I'll list a few here, none of which worked:

    // CONFIG_5: FAIL (96 Gb driver  144 Gb executor)
    --driver-memory 96g --executor-memory 6g --num-executors 24 --executor-cores 24 

    // CONFIG_4: FAIL (48 Gb driver  196 Gb executor)
    --driver-memory 48g --executor-memory 8g --num-executors 24 --executor-cores 24  

    //CONFIG_3: FAIL (120 Gb driver  128 Gb executor)
    --driver-memory 120g --executor-memory 4g --num-executors 32 --executor-cores 32   

    // CONFIG_2: FAIL (156 driver  96 executor)
    --driver-memory 156g --executor-memory 4g --num-executors 24 --executor-cores 24   

    // CONFIG_1: FAIL (224 Gb driver  48 Gb executor)
    --driver-memory 224g --executor-memory 1g --num-executors 1 --executor-cores 48

The error is the same each time -- ExecutorLostFailure (executor driver lost)

我想拉一个126 GB的表出干(PostgreSQL,在这种情况下,8.2)为火花,它不工作。我能拉小桌子没问题。对于这一个我不断得到的错误:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): ExecutorLostFailure (executor driver lost)
Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
    at scala.Option.foreach(Option.scala:236)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

My cluster specifications are as follows: 64 cores, 512 Gb of RAM, 2 nodes
This is a Spark standalone cluster on the 2 nodes (trust me, I'd like more nodes, but that's all I get). So I have one node as a pure slave, and the other node houses both the master and the other slave.

我已经尝试了许多配置的内存分配与星火提交工作,我会列出一些在这里,没有一个工作:

    // CONFIG_5: FAIL (96 Gb driver  144 Gb executor)
    --driver-memory 96g --executor-memory 6g --num-executors 24 --executor-cores 24 

    // CONFIG_4: FAIL (48 Gb driver  196 Gb executor)
    --driver-memory 48g --executor-memory 8g --num-executors 24 --executor-cores 24  

    //CONFIG_3: FAIL (120 Gb driver  128 Gb executor)
    --driver-memory 120g --executor-memory 4g --num-executors 32 --executor-cores 32   

    // CONFIG_2: FAIL (156 driver  96 executor)
    --driver-memory 156g --executor-memory 4g --num-executors 24 --executor-cores 24   

    // CONFIG_1: FAIL (224 Gb driver  48 Gb executor)
    --driver-memory 224g --executor-memory 1g --num-executors 1 --executor-cores 48

错误是相同的每个时间executorlostfailure(执行器驱动丢失)

apache-spark  apache-spark-sql  hawq