Task 13955167

Name	hadcm3n_yisg_1900_40_007515312_4
Workunit	7712787
Created	23 Jan 2012, 16:22:26 UTC
Sent	23 Jan 2012, 16:22:29 UTC
Report deadline	23 Apr 2012, 23:49:40 UTC
Received	19 Feb 2012, 13:04:29 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1192504
Run time	23 days 8 hours 15 min 44 sec
CPU time	19 days 16 hours 44 min 4 sec
Validate state	Invalid
Credit	11,197.44
Device peak FLOPS	2.19 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.12.34</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2996, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2944, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1216, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1784, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=692, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3652, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2480, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3088, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3088, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3088, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3088, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3088, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3088, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1496, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1496, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3000, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3556, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3556, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3556, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3556, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1664, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1664, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1664, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
18 Feb 2012 14:21:53	1192504	13955167	hadcm3n_yisg_1900_40_007515312_4	933,120	1,682,688	1.8033
17 Feb 2012 23:00:35	1192504	13955167	hadcm3n_yisg_1900_40_007515312_4	907,200	1,635,710	1.8030
17 Feb 2012 08:35:00	1192504	13955167	hadcm3n_yisg_1900_40_007515312_4	881,280	1,588,415	1.8024
16 Feb 2012 17:39:43	1192504	13955167	hadcm3n_yisg_1900_40_007515312_4	855,360	1,542,047	1.8028
15 Feb 2012 22:01:26	1192504	13955167	hadcm3n_yisg_1900_40_007515312_4	829,440	1,491,353	1.7980
15 Feb 2012 05:47:40	1192504	13955167	hadcm3n_yisg_1900_40_007515312_4	803,520	1,441,724	1.7943
14 Feb 2012 14:33:04	1192504	13955167	hadcm3n_yisg_1900_40_007515312_4	777,600	1,393,863	1.7925
14 Feb 2012 00:17:32	1192504	13955167	hadcm3n_yisg_1900_40_007515312_4	751,680	1,347,007	1.7920
13 Feb 2012 07:14:33	1192504	13955167	hadcm3n_yisg_1900_40_007515312_4	725,760	1,299,958	1.7912
12 Feb 2012 16:01:19	1192504	13955167	hadcm3n_yisg_1900_40_007515312_4	699,840	1,250,276	1.7865
11 Feb 2012 20:28:44	1192504	13955167	hadcm3n_yisg_1900_40_007515312_4	673,920	1,203,893	1.7864
11 Feb 2012 03:53:40	1192504	13955167	hadcm3n_yisg_1900_40_007515312_4	648,000	1,157,987	1.7870
10 Feb 2012 14:09:00	1192504	13955167	hadcm3n_yisg_1900_40_007515312_4	622,080	1,111,411	1.7866
09 Feb 2012 22:57:54	1192504	13955167	hadcm3n_yisg_1900_40_007515312_4	596,160	1,064,473	1.7855
09 Feb 2012 06:18:08	1192504	13955167	hadcm3n_yisg_1900_40_007515312_4	570,240	1,018,162	1.7855
08 Feb 2012 14:03:59	1192504	13955167	hadcm3n_yisg_1900_40_007515312_4	544,320	968,960	1.7801
07 Feb 2012 21:39:30	1192504	13955167	hadcm3n_yisg_1900_40_007515312_4	518,400	919,249	1.7732
07 Feb 2012 07:50:11	1192504	13955167	hadcm3n_yisg_1900_40_007515312_4	492,480	873,116	1.7729
06 Feb 2012 15:39:25	1192504	13955167	hadcm3n_yisg_1900_40_007515312_4	466,560	827,378	1.7734
05 Feb 2012 20:45:43	1192504	13955167	hadcm3n_yisg_1900_40_007515312_4	440,640	782,429	1.7757