Task 16040100

Name	hadcm3n_ob1d_1900_40_008469428_0
Workunit	8620267
Created	27 Sep 2013, 9:47:57 UTC
Sent	3 Oct 2013, 17:56:36 UTC
Report deadline	3 Jan 2014, 1:23:47 UTC
Received	6 Nov 2013, 21:04:57 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1212841
Run time	8 days 6 hours 37 min 52 sec
CPU time	8 days 4 hours 28 min 48 sec
Validate state	Invalid
Credit	8,398.08
Device peak FLOPS	2.93 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.0.64</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5668, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4008, iMonCtr=1 Model crash detected, will try to restart... 07:48:56 (5712): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 19:03:27 (5588): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1276, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5556, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2992, iMonCtr=1 Model crash detected, will try to restart... 18:53:53 (5644): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3096, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4508, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... 19:22:28 (4260): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4496, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6104, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4612, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4564, iMonCtr=1 Model crash detected, will try to restart... 05:32:33 (4792): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4832, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6104, iMonCtr=1 Model crash detected, will try to restart... 20:00:35 (5856): No heartbeat from core client for 30 sec - exiting 20:00:36 (5856): No heartbeat from core client for 30 sec - exiting 20:00:37 (5856): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 19:23:23 (3748): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:24:41 (2032): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... SETPOS: Unit 67 to Word Address -198 Failed with Error Code -1 Model crashed: SETPOS: Unit 67 to Word Address -198 Failed with Error Code -1 21:06:07 (5552): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... SETPOS: Unit 67 to Word Address -198 Failed with Error Code -1 Model crashed: SETPOS: Unit 67 to Word Address -198 Failed with Error Code -1 SETPOS: Unit 67 to Word Address -198 Failed with Error Code -1 Model crashed: SETPOS: Unit 67 to Word Address -198 Failed with Error Code -1 SETPOS: Unit 67 to Word Address -198 Failed with Error Code -1 Model crashed: SETPOS: Unit 67 to Word Address -198 Failed with Error Code -1 SETPOS: Unit 67 to Word Address -198 Failed with Error Code -1 Model crashed: SETPOS: Unit 67 to Word Address -198 Failed with Error Code -1 SETPOS: Unit 67 to Word Address -198 Failed with Error Code -1 Model crashed: SETPOS: Unit 67 to Word Address -198 Failed with Error Code -1 Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
03 Nov 2013 19:44:31	1212841	16040100	hadcm3n_ob1d_1900_40_008469428_0	699,840	691,786	0.9885
03 Nov 2013 12:18:51	1212841	16040100	hadcm3n_ob1d_1900_40_008469428_0	673,920	665,056	0.9868
02 Nov 2013 18:07:03	1212841	16040100	hadcm3n_ob1d_1900_40_008469428_0	648,000	638,612	0.9855
02 Nov 2013 10:59:01	1212841	16040100	hadcm3n_ob1d_1900_40_008469428_0	622,080	613,206	0.9857
31 Oct 2013 20:38:12	1212841	16040100	hadcm3n_ob1d_1900_40_008469428_0	596,160	586,360	0.9836
28 Oct 2013 21:06:58	1212841	16040100	hadcm3n_ob1d_1900_40_008469428_0	570,240	559,723	0.9816
28 Oct 2013 13:44:44	1212841	16040100	hadcm3n_ob1d_1900_40_008469428_0	544,320	549,574	1.0097
28 Oct 2013 06:29:28	1212841	16040100	hadcm3n_ob1d_1900_40_008469428_0	518,400	524,283	1.0113
27 Oct 2013 14:49:56	1212841	16040100	hadcm3n_ob1d_1900_40_008469428_0	492,480	497,889	1.0110
27 Oct 2013 07:42:57	1212841	16040100	hadcm3n_ob1d_1900_40_008469428_0	466,560	472,460	1.0126
26 Oct 2013 14:57:32	1212841	16040100	hadcm3n_ob1d_1900_40_008469428_0	440,640	446,324	1.0129
25 Oct 2013 20:05:51	1212841	16040100	hadcm3n_ob1d_1900_40_008469428_0	414,720	420,252	1.0133
23 Oct 2013 20:42:12	1212841	16040100	hadcm3n_ob1d_1900_40_008469428_0	388,800	394,042	1.0135
20 Oct 2013 18:02:53	1212841	16040100	hadcm3n_ob1d_1900_40_008469428_0	362,880	367,685	1.0132
20 Oct 2013 10:40:16	1212841	16040100	hadcm3n_ob1d_1900_40_008469428_0	336,960	341,277	1.0128
18 Oct 2013 20:50:47	1212841	16040100	hadcm3n_ob1d_1900_40_008469428_0	311,040	314,161	1.0100
16 Oct 2013 19:31:51	1212841	16040100	hadcm3n_ob1d_1900_40_008469428_0	285,120	287,293	1.0076
13 Oct 2013 18:49:41	1212841	16040100	hadcm3n_ob1d_1900_40_008469428_0	259,200	260,667	1.0057
13 Oct 2013 11:35:58	1212841	16040100	hadcm3n_ob1d_1900_40_008469428_0	233,280	234,772	1.0064
12 Oct 2013 18:43:15	1212841	16040100	hadcm3n_ob1d_1900_40_008469428_0	207,360	208,663	1.0063