Task 16146478

Name	hadcm3n_o2sc_1940_40_008379647_4
Workunit	8530506
Created	18 Dec 2013, 0:21:14 UTC
Sent	18 Dec 2013, 0:21:32 UTC
Report deadline	19 Mar 2014, 7:48:43 UTC
Received	22 Feb 2014, 1:54:54 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1122348
Run time	14 days 8 hours 56 min 22 sec
CPU time	14 days 7 hours 26 min 38 sec
Validate state	Invalid
Credit	7,776.00
Device peak FLOPS	2.30 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.10.58</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6268, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6268, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8128, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5412, iMonCtr=1 Model crash detected, will try to restart... 10:54:44 (5544): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 10:57:05 (4840): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 01:32:31 (5484): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 05:08:01 (6944): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6744, iMonCtr=1 Model crash detected, will try to restart... 19:42:48 (6988): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:49:13 (3356): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:54:02 (6776): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:57:16 (3448): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:09:11 (8004): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:17:41 (5268): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4008, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7532, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5812, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5812, iMonCtr=1 Model crash detected, will try to restart... 03:33:03 (6952): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1684, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6352, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5608, iMonCtr=1 Model crash detected, will try to restart... 17:41:53 (5412): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4572, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4572, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4572, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4572, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4572, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6248, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6356, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5688, iMonCtr=1 Model crash detected, will try to restart... 20:00:53 (5688): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:02:40 (7396): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:04:23 (1288): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:06:05 (7648): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:07:51 (4944): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:09:38 (7404): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4824, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4824, iMonCtr=1 Model crash detected, will try to restart... 20:12:54 (4824): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7620, iMonCtr=1 Model crash detected, will try to restart... 20:14:45 (7620): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:16:30 (7376): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:18:09 (7856): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:19:50 (5912): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:21:31 (7212): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:23:08 (6192): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:26:30 (6756): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6820, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
21 Feb 2014 15:38:25	1122348	16146478	hadcm3n_o2sc_1940_40_008379647_4	648,000	1,213,496	1.8727
21 Feb 2014 03:26:42	1122348	16146478	hadcm3n_o2sc_1940_40_008379647_4	622,080	1,171,946	1.8839
20 Feb 2014 15:58:54	1122348	16146478	hadcm3n_o2sc_1940_40_008379647_4	596,160	1,130,786	1.8968
20 Feb 2014 04:16:05	1122348	16146478	hadcm3n_o2sc_1940_40_008379647_4	570,240	1,088,769	1.9093
14 Feb 2014 13:16:39	1122348	16146478	hadcm3n_o2sc_1940_40_008379647_4	544,320	1,045,524	1.9208
13 Feb 2014 01:26:21	1122348	16146478	hadcm3n_o2sc_1940_40_008379647_4	518,400	1,001,026	1.9310
07 Feb 2014 22:53:39	1122348	16146478	hadcm3n_o2sc_1940_40_008379647_4	492,480	954,250	1.9376
06 Feb 2014 07:45:18	1122348	16146478	hadcm3n_o2sc_1940_40_008379647_4	466,560	903,075	1.9356
05 Feb 2014 06:30:33	1122348	16146478	hadcm3n_o2sc_1940_40_008379647_4	440,640	854,695	1.9397
04 Feb 2014 06:03:37	1122348	16146478	hadcm3n_o2sc_1940_40_008379647_4	414,720	804,190	1.9391
31 Jan 2014 07:28:28	1122348	16146478	hadcm3n_o2sc_1940_40_008379647_4	388,800	751,651	1.9333
31 Jan 2014 07:28:28	1122348	16146478	hadcm3n_o2sc_1940_40_008379647_4	362,880	697,156	1.9212
28 Jan 2014 03:40:19	1122348	16146478	hadcm3n_o2sc_1940_40_008379647_4	336,960	641,928	1.9051
24 Jan 2014 11:26:33	1122348	16146478	hadcm3n_o2sc_1940_40_008379647_4	311,040	585,964	1.8839
23 Jan 2014 07:09:58	1122348	16146478	hadcm3n_o2sc_1940_40_008379647_4	285,120	531,000	1.8624
16 Jan 2014 02:27:02	1122348	16146478	hadcm3n_o2sc_1940_40_008379647_4	259,200	475,301	1.8337
14 Jan 2014 12:19:03	1122348	16146478	hadcm3n_o2sc_1940_40_008379647_4	233,280	418,916	1.7958
10 Jan 2014 08:32:19	1122348	16146478	hadcm3n_o2sc_1940_40_008379647_4	207,360	374,252	1.8048
08 Jan 2014 21:28:51	1122348	16146478	hadcm3n_o2sc_1940_40_008379647_4	181,440	335,102	1.8469
08 Jan 2014 09:36:03	1122348	16146478	hadcm3n_o2sc_1940_40_008379647_4	155,520	291,986	1.8775