Task 15485711

Name	hadcm3n_3dgh_1940_40_008258205_0
Workunit	8413329
Created	20 Dec 2012, 10:02:49 UTC
Sent	20 Dec 2012, 10:02:56 UTC
Report deadline	21 Mar 2013, 17:30:07 UTC
Received	9 Jan 2013, 6:27:59 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	193 (0x000000C1) EXIT_SIGNAL
Computer ID	1062934
Run time	4 days 14 hours 38 min 43 sec
CPU time	4 days 10 hours 8 min 55 sec
Validate state	Invalid
Credit	3,110.40
Device peak FLOPS	2.44 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.0.28</core_client_version> <![CDATA[ <message> - exit code 193 (0xc1) </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3492, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3492, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3492, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3492, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3496, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3028, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3028, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3028, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3028, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3392, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3392, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3392, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3392, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3424, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3424, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3424, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3424, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3424, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3520, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3520, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3520, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3520, iMonCtr=1 Model crash detected, will try to restart... 09:25:42 (3464): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 14:38:04 (5400): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:07:48 (6172): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:22:56 (6644): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:37:23 (2696): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 16:47:53 (3436): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 16:47:54 (3436): No heartbeat from core client for 30 sec - exiting 16:47:55 (3436): No heartbeat from core client for 30 sec - exiting 16:47:56 (3436): No heartbeat from core client for 30 sec - exiting 16:48:01 (3436): No heartbeat from core client for 30 sec - exiting 16:48:02 (3436): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... 17:23:48 (3500): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 17:24:55 (5740): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 18:47:30 (3220): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:47:34 (3220): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... 20:35:29 (1588): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:11:19 (4132): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:12:57 (6564): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:12:58 (6564): No heartbeat from core client for 30 sec - exiting 23:12:59 (6564): No heartbeat from core client for 30 sec - exiting 23:13:36 (6288): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:13:37 (6288): No heartbeat from core client for 30 sec - exiting 23:13:38 (6288): No heartbeat from core client for 30 sec - exiting 23:13:39 (6288): No heartbeat from core client for 30 sec - exiting 23:13:40 (6288): No heartbeat from core client for 30 sec - exiting 23:13:41 (6288): No heartbeat from core client for 30 sec - exiting 23:13:42 (6288): No heartbeat from core client for 30 sec - exiting 23:13:43 (6288): No heartbeat from core client for 30 sec - exiting 23:13:44 (6288): No heartbeat from core client for 30 sec - exiting 23:13:45 (6288): No heartbeat from core client for 30 sec - exiting 23:13:46 (6288): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3444, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3444, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3444, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3444, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Signal 11 received, exiting... Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
08 Jan 2013 19:05:43	1062934	15485711	hadcm3n_3dgh_1940_40_008258205_0	259,200	382,130	1.4743
07 Jan 2013 19:26:37	1062934	15485711	hadcm3n_3dgh_1940_40_008258205_0	233,280	343,177	1.4711
07 Jan 2013 08:15:51	1062934	15485711	hadcm3n_3dgh_1940_40_008258205_0	207,360	305,390	1.4728
26 Dec 2012 19:38:54	1062934	15485711	hadcm3n_3dgh_1940_40_008258205_0	181,440	267,478	1.4742
25 Dec 2012 16:06:05	1062934	15485711	hadcm3n_3dgh_1940_40_008258205_0	155,520	229,271	1.4742
24 Dec 2012 17:56:49	1062934	15485711	hadcm3n_3dgh_1940_40_008258205_0	129,600	190,882	1.4729
24 Dec 2012 06:47:23	1062934	15485711	hadcm3n_3dgh_1940_40_008258205_0	103,680	152,972	1.4754
22 Dec 2012 21:55:23	1062934	15485711	hadcm3n_3dgh_1940_40_008258205_0	77,760	114,324	1.4702
22 Dec 2012 11:04:54	1062934	15485711	hadcm3n_3dgh_1940_40_008258205_0	51,840	76,233	1.4705
21 Dec 2012 11:22:17	1062934	15485711	hadcm3n_3dgh_1940_40_008258205_0	25,920	38,183	1.4731