Task 12989497

Name	hadcm3n_s3ld_1940_40_007299509_0
Workunit	7496933
Created	20 Jun 2011, 19:01:46 UTC
Sent	20 Jun 2011, 19:02:12 UTC
Report deadline	20 Sep 2011, 2:29:23 UTC
Received	23 Jul 2011, 15:39:23 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1154759
Run time	9 days 17 hours 3 min 20 sec
CPU time	8 days 5 hours 44 min 43 sec
Validate state	Invalid
Credit	2,799.36
Device peak FLOPS	1.26 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.12.33</core_client_version> <![CDATA[ <message> Het apparaat herkent de opdracht niet. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5884, iMonCtr=1 Model crash detected, will try to restart... 09:52:41 (940): No heartbeat from core client for 30 sec - exiting 09:52:42 (940): No heartbeat from core client for 30 sec - exiting 09:52:43 (940): No heartbeat from core client for 30 sec - exiting 09:52:44 (940): No heartbeat from core client for 30 sec - exiting 09:52:45 (940): No heartbeat from core client for 30 sec - exiting 09:52:46 (940): No heartbeat from core client for 30 sec - exiting 09:52:48 (940): No heartbeat from core client for 30 sec - exiting 09:52:49 (940): No heartbeat from core client for 30 sec - exiting 09:52:50 (940): No heartbeat from core client for 30 sec - exiting 09:52:51 (940): No heartbeat from core client for 30 sec - exiting 09:52:52 (940): No heartbeat from core client for 30 sec - exiting 09:52:53 (940): No heartbeat from core client for 30 sec - exiting 09:52:54 (940): No heartbeat from core client for 30 sec - exiting 09:52:55 (940): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4888, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4200, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 16:19:37 (3172): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7000, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5784, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5584, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3372, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3372, iMonCtr=1 Model crash detected, will try to restart... 00:27:27 (3372): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5100, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4484, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5732, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3996, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5896, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5632, iMonCtr=1 Model crash detected, will try to restart... 22:46:21 (4472): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4116, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5444, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3844, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4112, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5936, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4724, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=840, selfPID=840, iMonCtr=1 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3504, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4040, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4320, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Model crashed: ATM_DYN : INVALID THETA DETECTED. tmp/pipe_dummy 2048 Model crashed: ATM_DYN : INVALID THETA DETECTED. tmp/pipe_dummy 2048 Model crashed: ATM_DYN : INVALID THETA DETECTED. tmp/pipe_dummy 2048 Model crashed: ATM_DYN : INVALID THETA DETECTED. tmp/pipe_dummy 2048 Model crashed: ATM_DYN : INVALID THETA DETECTED. tmp/pipe_dummy 2048 Model crashed: ATM_DYN : INVALID THETA DETECTED. tmp/pipe_dummy 2048 Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
25 Jul 2011 19:39:46	1154759	12989497	hadcm3n_s3ld_1940_40_007299509_0	233,280	714,311	3.0620
25 Jul 2011 18:09:05	1154759	12989497	hadcm3n_s3ld_1940_40_007299509_0	207,360	637,393	3.0738
25 Jul 2011 13:21:44	1154759	12989497	hadcm3n_s3ld_1940_40_007299509_0	181,440	555,606	3.0622
25 Jul 2011 13:21:44	1154759	12989497	hadcm3n_s3ld_1940_40_007299509_0	155,520	481,693	3.0973
05 Jul 2011 23:52:42	1154759	12989497	hadcm3n_s3ld_1940_40_007299509_0	129,600	393,284	3.0346
03 Jul 2011 17:06:08	1154759	12989497	hadcm3n_s3ld_1940_40_007299509_0	103,680	312,107	3.0103
30 Jun 2011 13:38:12	1154759	12989497	hadcm3n_s3ld_1940_40_007299509_0	77,760	235,310	3.0261
26 Jun 2011 19:33:59	1154759	12989497	hadcm3n_s3ld_1940_40_007299509_0	51,840	156,632	3.0215
22 Jun 2011 20:37:27	1154759	12989497	hadcm3n_s3ld_1940_40_007299509_0	25,920	80,921	3.1220