找到你要的答案

Q:spark worker running contineuosly giving errors

Q:火花工人运行contineuosly给错误

I have setup cassandra cluster with 2 data centres and 3 nodes each, each data centre has 1 seed node and a replication factor 2. My spark is also setup with 2 worker machines of 2 core and 8 gb ram and a master machine with 4 gb ram. Now I am running spark jobs after every hour and the data it needs to process in this hour is around 20,00,000. My spark job is contineously running showing this error in my worker node

15/08/14 03:31:54 INFO LocalNodeFirstLoadBalancingPolicy: Suspected host 10.0.1.205 (DC2)
15/08/14 03:32:27 INFO RequestHandler: Query SELECT "addedtime" FROM "sams"."events" WHERE token("appname") > ? AND token("appname") <= ? AND "addedtime" >= ? AND "addedtime" < ?   ALLOW FILTERING is not prepared on /10.0.1.205:9042, preparing before retrying executing. Seeing this message a few times is fine, but seeing it a lot may be source of performance problems


 ERROR Session: Error creating pool to /10.0.1.205:9042
com.datastax.driver.core.ConnectionException: [/10.0.1.205:9042] Unexpected error during transport initialization (com.datastax.driver.core.OperationTimedOutException: [/10.0.1.205:9042] Operation timed out)
        at com.datastax.driver.core.Connection.initializeTransport(Connection.java:186)
        at com.datastax.driver.core.Connection.<init>(Connection.java:116)
        at com.datastax.driver.core.PooledConnection.<init>(PooledConnection.java:32)
        at com.datastax.driver.core.Connection$Factory.open(Connection.java:586)
        at com.datastax.driver.core.SingleConnectionPool.<init>(SingleConnectionPool.java:76)
        at com.datastax.driver.core.HostConnectionPool.newInstance(HostConnectionPool.java:35)
        at com.datastax.driver.core.SessionManager$2.call(SessionManager.java:231)
        at com.datastax.driver.core.SessionManager$2.call(SessionManager.java:224)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: com.datastax.driver.core.OperationTimedOutException: [/10.0.1.205:9042] Operation timed out
        at com.datastax.driver.core.Connection$Future.onTimeout(Connection.java:917)
        at com.datastax.driver.core.Connection$ResponseHandler$1.run(Connection.java:981)
        at org.jboss.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:546)
        at org.jboss.netty.util.HashedWheelTimer$Worker.notifyExpiredTimeouts(HashedWheelTimer.java:446)
        at org.jboss.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:395)
        at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
        ... 1 more

Can you please let me know what can be the issue. Here are the options with which I am running spark jobs

--conf spark.cassandra.input.split.size_in_mb=67108864
 --executor-memory 6G
--driver-memory 6G

Adding my table scehma

CREATE TABLE events (
 addedtime timestamp,
 appname text,
 hostname text,
 assetname text,
 brandname text,
 eventname text,
 eventorigin text,
 eventtime timestamp,
 PRIMARY KEY ((appname), addedtime, hostname)
);

My appname is the partition key and addedtime and hostname are part of clustering keys.

我有2个数据中心和3个节点都设置卡桑德拉集群,每个数据中心有1个种子节点和一个复制因子2。我的星火也是2个工人的2核心和8 GB RAM和一个主机与4 GB RAM设置。现在我运行火花工作后,每隔一小时,它需要在这个时间过程的数据大约是00000 20,。我的工作是连续运行火花呈现这错误在我的工作节点

15/08/14 03:31:54 INFO LocalNodeFirstLoadBalancingPolicy: Suspected host 10.0.1.205 (DC2)
15/08/14 03:32:27 INFO RequestHandler: Query SELECT "addedtime" FROM "sams"."events" WHERE token("appname") > ? AND token("appname") <= ? AND "addedtime" >= ? AND "addedtime" < ?   ALLOW FILTERING is not prepared on /10.0.1.205:9042, preparing before retrying executing. Seeing this message a few times is fine, but seeing it a lot may be source of performance problems


 ERROR Session: Error creating pool to /10.0.1.205:9042
com.datastax.driver.core.ConnectionException: [/10.0.1.205:9042] Unexpected error during transport initialization (com.datastax.driver.core.OperationTimedOutException: [/10.0.1.205:9042] Operation timed out)
        at com.datastax.driver.core.Connection.initializeTransport(Connection.java:186)
        at com.datastax.driver.core.Connection.<init>(Connection.java:116)
        at com.datastax.driver.core.PooledConnection.<init>(PooledConnection.java:32)
        at com.datastax.driver.core.Connection$Factory.open(Connection.java:586)
        at com.datastax.driver.core.SingleConnectionPool.<init>(SingleConnectionPool.java:76)
        at com.datastax.driver.core.HostConnectionPool.newInstance(HostConnectionPool.java:35)
        at com.datastax.driver.core.SessionManager$2.call(SessionManager.java:231)
        at com.datastax.driver.core.SessionManager$2.call(SessionManager.java:224)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: com.datastax.driver.core.OperationTimedOutException: [/10.0.1.205:9042] Operation timed out
        at com.datastax.driver.core.Connection$Future.onTimeout(Connection.java:917)
        at com.datastax.driver.core.Connection$ResponseHandler$1.run(Connection.java:981)
        at org.jboss.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:546)
        at org.jboss.netty.util.HashedWheelTimer$Worker.notifyExpiredTimeouts(HashedWheelTimer.java:446)
        at org.jboss.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:395)
        at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
        ... 1 more

Can you please let me know what can be the issue. Here are the options with which I am running spark jobs

--conf spark.cassandra.input.split.size_in_mb=67108864
 --executor-memory 6G
--driver-memory 6G

加入我的桌子scehma

CREATE TABLE events (
 addedtime timestamp,
 appname text,
 hostname text,
 assetname text,
 brandname text,
 eventname text,
 eventorigin text,
 eventtime timestamp,
 PRIMARY KEY ((appname), addedtime, hostname)
);

我的应用程序的名字是分区键和addedtime和主机名是聚类的关键部分。

answer1: 回答1:

spark.cassandra.query.retry.delay

spark.cassandra.read.timeout_ms

spark.cassandra.input.split.size_in_mb

spark.cassandra.input.fetch.size_in_rows

thwese you have to set in SparkConf .

spark.cassandra.query.retry.delay

spark.cassandra.read.timeout_ms

spark.cassandra.input.split.size_in_mb

spark.cassandra.input.fetch.size_in_rows

你要在SparkConf thwese。

cassandra  apache-spark