Task 12972236

Name	hadcm3n_o4oy_1940_40_007266050_2
Workunit	7464290
Created	11 Jun 2011, 13:49:35 UTC
Sent	11 Jun 2011, 13:49:42 UTC
Report deadline	10 Sep 2011, 21:16:53 UTC
Received	18 Sep 2011, 10:28:55 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	193 (0x000000C1) EXIT_SIGNAL
Computer ID	1012620
Run time	19 days 1 hours 31 min 16 sec
CPU time	13 days 9 hours 1 min 19 sec
Validate state	Invalid
Credit	6,220.80
Device peak FLOPS	2.26 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.12.34</core_client_version> <![CDATA[ <message> - exit code 193 (0xc1) </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2576, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4300, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... 13:16:31 (5916): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 13:23:53 (4512): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 09:58:15 (4352): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CBUFFOUT: C I/O Error - Return code = 32 Model crashed: WRITDUMP: BAD BUFFOUT OF DATA tmp/pipe_dummy 2048 Suspended CPDN Monitor - Suspend request from BOINC... Ocean Restart file copy failed on o4oyko.dae73f0 Suspended CPDN Monitor - Suspend request from BOINC... 20:16:18 (4568): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5984, iMonCtr=1 Model crash detected, will try to restart... 12:03:56 (1900): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 12:03:57 (1900): No heartbeat from core client for 30 sec - exiting 12:03:58 (1900): No heartbeat from core client for 30 sec - exiting 12:03:59 (1900): No heartbeat from core client for 30 sec - exiting 12:04:00 (1900): No heartbeat from core client for 30 sec - exiting 12:04:01 (1900): No heartbeat from core client for 30 sec - exiting 12:04:02 (1900): No heartbeat from core client for 30 sec - exiting 12:04:03 (1900): No heartbeat from core client for 30 sec - exiting 12:04:04 (1900): No heartbeat from core client for 30 sec - exiting 12:04:05 (1900): No heartbeat from core client for 30 sec - exiting 12:04:07 (1900): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4516, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6040, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5364, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5880, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2716, iMonCtr=1 Model crash detected, will try to restart... 11:24:52 (4652): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 17:05:02 (5168): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 17:05:03 (5168): No heartbeat from core client for 30 sec - exiting 17:05:04 (5168): No heartbeat from core client for 30 sec - exiting 17:05:05 (5168): No heartbeat from core client for 30 sec - exiting 17:05:06 (5168): No heartbeat from core client for 30 sec - exiting 17:05:07 (5168): No heartbeat from core client for 30 sec - exiting 17:05:08 (5168): No heartbeat from core client for 30 sec - exiting 17:05:09 (5168): No heartbeat from core client for 30 sec - exiting 17:05:10 (5168): No heartbeat from core client for 30 sec - exiting 17:05:11 (5168): No heartbeat from core client for 30 sec - exiting 17:05:12 (5168): No heartbeat from core client for 30 sec - exiting 17:05:13 (5168): No heartbeat from core client for 30 sec - exiting Ocean Restart file copy failed on o4oyko.daf19g0 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4980, iMonCtr=1 Model crash detected, will try to restart... CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2344, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4924, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... BUFFOUT: C I/O Error - Return code = 32 Model crashed: WRITHEAD: I/O error tmp/pipe_dummy 2048 BUFFOUT: C I/O Error - Return code = 32 Model crashed: WRITDUMP: BAD BUFFOUT OF DATA tmp/pipe_dummy 2048 Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1208, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4104, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1340, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4448, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5572, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5412, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5544, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5612, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5088, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5624, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5132, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4352, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5164, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2824, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2824, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Signal 11 received, exiting... Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
18 Sep 2011 10:28:35	1012620	12972236	hadcm3n_o4oy_1940_40_007266050_2	518,400	1,155,656	2.2293
16 Sep 2011 07:14:29	1012620	12972236	hadcm3n_o4oy_1940_40_007266050_2	492,480	1,093,383	2.2202
10 Sep 2011 08:00:13	1012620	12972236	hadcm3n_o4oy_1940_40_007266050_2	466,560	1,028,324	2.2041
05 Sep 2011 12:01:44	1012620	12972236	hadcm3n_o4oy_1940_40_007266050_2	440,640	965,067	2.1901
02 Sep 2011 15:15:03	1012620	12972236	hadcm3n_o4oy_1940_40_007266050_2	414,720	898,846	2.1674
29 Aug 2011 13:51:18	1012620	12972236	hadcm3n_o4oy_1940_40_007266050_2	388,800	834,760	2.1470
25 Aug 2011 08:57:10	1012620	12972236	hadcm3n_o4oy_1940_40_007266050_2	362,880	770,704	2.1239
24 Aug 2011 16:33:12	1012620	12972236	hadcm3n_o4oy_1940_40_007266050_2	336,960	742,768	2.2043
20 Aug 2011 14:06:15	1012620	12972236	hadcm3n_o4oy_1940_40_007266050_2	311,040	678,726	2.1821
13 Aug 2011 11:06:21	1012620	12972236	hadcm3n_o4oy_1940_40_007266050_2	285,120	614,939	2.1568
06 Aug 2011 10:31:09	1012620	12972236	hadcm3n_o4oy_1940_40_007266050_2	259,200	552,502	2.1316
04 Aug 2011 05:58:58	1012620	12972236	hadcm3n_o4oy_1940_40_007266050_2	233,280	489,179	2.0970
02 Aug 2011 14:53:56	1012620	12972236	hadcm3n_o4oy_1940_40_007266050_2	207,360	428,052	2.0643
02 Aug 2011 14:53:56	1012620	12972236	hadcm3n_o4oy_1940_40_007266050_2	181,440	364,373	2.0082
25 Jul 2011 15:27:19	1012620	12972236	hadcm3n_o4oy_1940_40_007266050_2	155,520	300,880	1.9347
09 Jul 2011 07:45:02	1012620	12972236	hadcm3n_o4oy_1940_40_007266050_2	129,600	243,181	1.8764
05 Jul 2011 07:33:24	1012620	12972236	hadcm3n_o4oy_1940_40_007266050_2	103,680	191,627	1.8483
05 Jul 2011 07:33:24	1012620	12972236	hadcm3n_o4oy_1940_40_007266050_2	77,760	180,202	2.3174
17 Jun 2011 15:13:03	1012620	12972236	hadcm3n_o4oy_1940_40_007266050_2	51,840	117,836	2.2731
15 Jun 2011 10:12:10	1012620	12972236	hadcm3n_o4oy_1940_40_007266050_2	25,920	60,161	2.3210