Task 15865084

Name	hadcm3n_o7cq_1980_40_008396268_0
Workunit	8547127
Created	26 Jun 2013, 1:46:20 UTC
Sent	29 Jun 2013, 23:01:17 UTC
Report deadline	29 Sep 2013, 6:28:28 UTC
Received	21 Sep 2013, 0:39:32 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1261147
Run time	7 days 22 hours 49 min
CPU time	7 days 8 hours 21 min 19 sec
Validate state	Invalid
Credit	5,598.72
Device peak FLOPS	2.79 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.0.28</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5884, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1220, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4896, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2880, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3068, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3032, iMonCtr=1 Model crash detected, will try to restart... 21:04:07 (3864): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5796, iMonCtr=1 Model crash detected, will try to restart... 10:25:20 (3188): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:41:48 (4820): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:11:01 (2908): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:11:02 (2908): No heartbeat from core client for 30 sec - exiting 20:49:32 (4176): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 17:00:15 (4444): No heartbeat from core client for 30 sec - exiting 17:00:16 (4444): No heartbeat from core client for 30 sec - exiting 17:00:17 (4444): No heartbeat from core client for 30 sec - exiting 17:00:18 (4444): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4624, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4000, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5560, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1056, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4584, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3372, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2540, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2968, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5124, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5124, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5124, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5124, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5124, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5124, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
20 Sep 2013 01:21:37	1261147	15865084	hadcm3n_o7cq_1980_40_008396268_0	466,560	628,629	1.3474
16 Sep 2013 00:18:33	1261147	15865084	hadcm3n_o7cq_1980_40_008396268_0	440,640	593,812	1.3476
14 Sep 2013 01:20:05	1261147	15865084	hadcm3n_o7cq_1980_40_008396268_0	414,720	558,497	1.3467
09 Sep 2013 00:16:11	1261147	15865084	hadcm3n_o7cq_1980_40_008396268_0	388,800	522,827	1.3447
05 Sep 2013 03:36:56	1261147	15865084	hadcm3n_o7cq_1980_40_008396268_0	362,880	487,493	1.3434
02 Sep 2013 14:26:37	1261147	15865084	hadcm3n_o7cq_1980_40_008396268_0	336,960	452,084	1.3417
31 Aug 2013 21:30:34	1261147	15865084	hadcm3n_o7cq_1980_40_008396268_0	311,040	416,968	1.3406
30 Aug 2013 01:03:01	1261147	15865084	hadcm3n_o7cq_1980_40_008396268_0	285,120	383,284	1.3443
16 Aug 2013 01:32:52	1261147	15865084	hadcm3n_o7cq_1980_40_008396268_0	259,200	348,828	1.3458
15 Aug 2013 22:16:51	1261147	15865084	hadcm3n_o7cq_1980_40_008396268_0	233,280	313,914	1.3457
15 Aug 2013 22:16:51	1261147	15865084	hadcm3n_o7cq_1980_40_008396268_0	207,360	279,248	1.3467
15 Aug 2013 22:16:51	1261147	15865084	hadcm3n_o7cq_1980_40_008396268_0	181,440	244,063	1.3451
15 Aug 2013 22:16:51	1261147	15865084	hadcm3n_o7cq_1980_40_008396268_0	155,520	209,312	1.3459
26 Jul 2013 01:49:38	1261147	15865084	hadcm3n_o7cq_1980_40_008396268_0	129,600	174,656	1.3477
23 Jul 2013 21:58:39	1261147	15865084	hadcm3n_o7cq_1980_40_008396268_0	103,680	139,974	1.3501
23 Jul 2013 20:09:16	1261147	15865084	hadcm3n_o7cq_1980_40_008396268_0	77,760	105,531	1.3571
23 Jul 2013 20:09:16	1261147	15865084	hadcm3n_o7cq_1980_40_008396268_0	51,840	70,340	1.3569
09 Jul 2013 00:08:10	1261147	15865084	hadcm3n_o7cq_1980_40_008396268_0	25,920	35,110	1.3546