Task 15470996

Name	hadcm3n_zak7_1880_40_008246378_2
Workunit	8401502
Created	4 Dec 2012, 17:21:56 UTC
Sent	4 Dec 2012, 17:21:58 UTC
Report deadline	6 Mar 2013, 0:49:09 UTC
Received	24 Jan 2013, 15:52:43 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	193 (0x000000C1) EXIT_SIGNAL
Computer ID	1181612
Run time	17 days 14 hours 52 min 37 sec
CPU time	15 days 10 hours 48 min 43 sec
Validate state	Invalid
Credit	9,331.20
Device peak FLOPS	2.25 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.0.28</core_client_version> <![CDATA[ <message> - exit code 193 (0xc1) </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4180, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3548, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1132, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3200, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3200, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3200, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4104, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2852, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2852, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2852, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2852, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2852, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3940, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2756, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2756, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3060, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3988, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1292, iMonCtr=1 Model crash detected, will try to restart... 12:19:38 (2448): No heartbeat from core client for 30 sec - exiting 12:19:40 (2448): No heartbeat from core client for 30 sec - exiting 12:19:41 (2448): No heartbeat from core client for 30 sec - exiting 12:19:42 (2448): No heartbeat from core client for 30 sec - exiting 12:19:43 (2448): No heartbeat from core client for 30 sec - exiting 12:19:44 (2448): No heartbeat from core client for 30 sec - exiting 12:19:45 (2448): No heartbeat from core client for 30 sec - exiting 12:19:46 (2448): No heartbeat from core client for 30 sec - exiting 12:19:47 (2448): No heartbeat from core client for 30 sec - exiting 12:19:48 (2448): No heartbeat from core client for 30 sec - exiting 12:19:49 (2448): No heartbeat from core client for 30 sec - exiting 12:19:51 (2448): No heartbeat from core client for 30 sec - exiting 12:19:52 (2448): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2800, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4028, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4028, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3504, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2788, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2396, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1136, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x77576737 read attempt to address 0xFFFFFFF8 Engaging BOINC Windows Runtime Debugger... Cannot serialize file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_zak7_1880_40_008246378/dataout/shmem_restart.day Signal 11 received, exiting... Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
24 Jan 2013 15:55:42	1181612	15470996	hadcm3n_zak7_1880_40_008246378_2	777,600	1,334,915	1.7167
23 Jan 2013 00:36:41	1181612	15470996	hadcm3n_zak7_1880_40_008246378_2	751,680	1,291,018	1.7175
21 Jan 2013 09:06:17	1181612	15470996	hadcm3n_zak7_1880_40_008246378_2	725,760	1,247,244	1.7185
20 Jan 2013 20:57:44	1181612	15470996	hadcm3n_zak7_1880_40_008246378_2	699,840	1,204,749	1.7215
20 Jan 2013 00:23:02	1181612	15470996	hadcm3n_zak7_1880_40_008246378_2	673,920	1,161,607	1.7237
19 Jan 2013 02:28:49	1181612	15470996	hadcm3n_zak7_1880_40_008246378_2	648,000	1,117,034	1.7238
18 Jan 2013 03:13:31	1181612	15470996	hadcm3n_zak7_1880_40_008246378_2	622,080	1,072,286	1.7237
17 Jan 2013 05:03:48	1181612	15470996	hadcm3n_zak7_1880_40_008246378_2	596,160	1,025,473	1.7201
25 Dec 2012 06:37:04	1181612	15470996	hadcm3n_zak7_1880_40_008246378_2	570,240	978,716	1.7163
24 Dec 2012 07:47:40	1181612	15470996	hadcm3n_zak7_1880_40_008246378_2	544,320	934,394	1.7166
23 Dec 2012 08:07:52	1181612	15470996	hadcm3n_zak7_1880_40_008246378_2	518,400	890,386	1.7176
22 Dec 2012 08:19:19	1181612	15470996	hadcm3n_zak7_1880_40_008246378_2	492,480	845,584	1.7170
21 Dec 2012 08:01:32	1181612	15470996	hadcm3n_zak7_1880_40_008246378_2	466,560	799,719	1.7141
20 Dec 2012 07:51:36	1181612	15470996	hadcm3n_zak7_1880_40_008246378_2	440,640	755,297	1.7141
19 Dec 2012 18:40:07	1181612	15470996	hadcm3n_zak7_1880_40_008246378_2	414,720	711,395	1.7154
18 Dec 2012 23:30:10	1181612	15470996	hadcm3n_zak7_1880_40_008246378_2	388,800	666,997	1.7155
17 Dec 2012 23:09:58	1181612	15470996	hadcm3n_zak7_1880_40_008246378_2	362,880	622,727	1.7161
17 Dec 2012 00:38:09	1181612	15470996	hadcm3n_zak7_1880_40_008246378_2	336,960	575,011	1.7065
15 Dec 2012 06:38:08	1181612	15470996	hadcm3n_zak7_1880_40_008246378_2	311,040	528,812	1.7001
14 Dec 2012 08:19:33	1181612	15470996	hadcm3n_zak7_1880_40_008246378_2	285,120	484,978	1.7010