Task 15476485

Name	hadcm3n_zd38_1920_40_008255705_1
Workunit	8410829
Created	14 Dec 2012, 3:19:24 UTC
Sent	14 Dec 2012, 3:20:52 UTC
Report deadline	15 Mar 2013, 10:48:03 UTC
Received	10 Feb 2013, 5:50:46 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1251637
Run time	17 days 8 hours 13 min 29 sec
CPU time	15 days 13 hours 1 min 57 sec
Validate state	Invalid
Credit	7,153.92
Device peak FLOPS	2.51 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.0.28</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5476, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CSuspended CPDN Monitor - Suspend request from BOINC... 14:21:54 (2260): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:21:56 (2260): No heartbeat from core client for 30 sec - exiting 14:21:57 (2260): No heartbeat from core client for 30 sec - exiting 14:21:58 (2260): No heartbeat from core client for 30 sec - exiting 14:21:59 (2260): No heartbeat from core client for 30 sec - exiting 14:22:00 (2260): No heartbeat from core client for 30 sec - exiting 14:22:01 (2260): No heartbeat from core client for 30 sec - exiting 14:22:02 (2260): No heartbeat from core client for 30 sec - exiting 14:22:03 (2260): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1180, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5692, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5436, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4840, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5624, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5088, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3668, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5808, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4496, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5620, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... cpdnmonitor: error reading file dataout/ocean_restart.day cpdnmonitor: error reading file dataout/atmos_restart.hold Model crashed: TEMPHIST: Write ERROR on history file for namelistNLIHISTO tmp/pipe_dummy 2048 Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5124, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5124, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5124, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5124, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5124, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
08 Feb 2013 04:35:29	1251637	15476485	hadcm3n_zd38_1920_40_008255705_1	596,160	1,342,689	2.2522
07 Feb 2013 10:48:37	1251637	15476485	hadcm3n_zd38_1920_40_008255705_1	570,240	1,285,305	2.2540
06 Feb 2013 01:26:31	1251637	15476485	hadcm3n_zd38_1920_40_008255705_1	544,320	1,228,568	2.2571
03 Feb 2013 22:35:08	1251637	15476485	hadcm3n_zd38_1920_40_008255705_1	518,400	1,170,979	2.2588
03 Feb 2013 22:35:08	1251637	15476485	hadcm3n_zd38_1920_40_008255705_1	492,480	1,113,453	2.2609
29 Jan 2013 09:34:38	1251637	15476485	hadcm3n_zd38_1920_40_008255705_1	466,560	1,055,738	2.2628
23 Jan 2013 01:06:48	1251637	15476485	hadcm3n_zd38_1920_40_008255705_1	440,640	998,144	2.2652
20 Jan 2013 09:09:42	1251637	15476485	hadcm3n_zd38_1920_40_008255705_1	414,720	940,415	2.2676
19 Jan 2013 09:05:15	1251637	15476485	hadcm3n_zd38_1920_40_008255705_1	388,800	883,198	2.2716
18 Jan 2013 15:36:37	1251637	15476485	hadcm3n_zd38_1920_40_008255705_1	362,880	825,411	2.2746
15 Jan 2013 16:00:59	1251637	15476485	hadcm3n_zd38_1920_40_008255705_1	336,960	767,965	2.2791
09 Jan 2013 21:10:56	1251637	15476485	hadcm3n_zd38_1920_40_008255705_1	311,040	710,273	2.2835
04 Jan 2013 18:14:54	1251637	15476485	hadcm3n_zd38_1920_40_008255705_1	285,120	630,611	2.2117
03 Jan 2013 23:33:00	1251637	15476485	hadcm3n_zd38_1920_40_008255705_1	259,200	572,476	2.2086
03 Jan 2013 06:10:21	1251637	15476485	hadcm3n_zd38_1920_40_008255705_1	233,280	515,375	2.2093
30 Dec 2012 19:19:14	1251637	15476485	hadcm3n_zd38_1920_40_008255705_1	207,360	456,231	2.2002
30 Dec 2012 00:52:23	1251637	15476485	hadcm3n_zd38_1920_40_008255705_1	181,440	396,997	2.1880
29 Dec 2012 06:24:47	1251637	15476485	hadcm3n_zd38_1920_40_008255705_1	155,520	338,452	2.1763
23 Dec 2012 07:12:42	1251637	15476485	hadcm3n_zd38_1920_40_008255705_1	129,600	282,222	2.1776
23 Dec 2012 01:56:37	1251637	15476485	hadcm3n_zd38_1920_40_008255705_1	103,680	226,300	2.1827