Task 13569524

Name	hadcm3n_yn10_1900_40_007527691_3
Workunit	7725166
Created	30 Oct 2011, 19:41:53 UTC
Sent	30 Oct 2011, 19:52:09 UTC
Report deadline	30 Jan 2012, 3:19:20 UTC
Received	15 Nov 2011, 19:13:36 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1038991
Run time	4 days 8 hours 0 min 22 sec
CPU time	3 days 8 hours 45 min
Validate state	Invalid
Credit	1,555.20
Device peak FLOPS	2.09 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.10.18</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> 20:11:34 (3508): No heartbeat from core client for 30 sec - exiting 20:11:36 (3508): No heartbeat from core client for 30 sec - exiting 20:11:37 (3508): No heartbeat from core client for 30 sec - exiting 20:11:38 (3508): No heartbeat from core client for 30 sec - exiting 20:11:39 (3508): No heartbeat from core client for 30 sec - exiting 20:11:40 (3508): No heartbeat from core client for 30 sec - exiting 20:11:41 (3508): No heartbeat from core client for 30 sec - exiting 20:11:42 (3508): No heartbeat from core client for 30 sec - exiting 20:11:43 (3508): No heartbeat from core client for 30 sec - exiting 20:11:45 (3508): No heartbeat from core client for 30 sec - exiting 20:11:46 (3508): No heartbeat from core client for 30 sec - exiting 20:11:47 (3508): No heartbeat from core client for 30 sec - exiting 20:11:48 (3508): No heartbeat from core client for 30 sec - exiting 20:11:49 (3508): No heartbeat from core client for 30 sec - exiting 20:11:50 (3508): No heartbeat from core client for 30 sec - exiting 20:11:51 (3508): No heartbeat from core client for 30 sec - exiting 20:11:52 (3508): No heartbeat from core client for 30 sec - exiting 20:11:53 (3508): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5144, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5144, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2920, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... 20:43:56 (4004): No heartbeat from core client for 30 sec - exiting 20:43:57 (4004): No heartbeat from core client for 30 sec - exiting 20:43:58 (4004): No heartbeat from core client for 30 sec - exiting 20:43:59 (4004): No heartbeat from core client for 30 sec - exiting 20:44:01 (4004): No heartbeat from core client for 30 sec - exiting 20:44:02 (4004): No heartbeat from core client for 30 sec - exiting 20:44:03 (4004): No heartbeat from core client for 30 sec - exiting 20:44:04 (4004): No heartbeat from core client for 30 sec - exiting 20:44:05 (4004): No heartbeat from core client for 30 sec - exiting 20:44:07 (4004): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4168, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2504, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2504, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2504, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2504, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2504, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2504, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_yn10_1900_40_007527691/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_yn10_1900_40_007527691/dataout/ocean_restart.day after 11 attempts Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6880, iMonCtr=1 Model crash detected, will try to restart... cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_yn10_1900_40_007527691/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_yn10_1900_40_007527691/dataout/ocean_restart.day after 11 attempts Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6880, iMonCtr=1 Model crash detected, will try to restart... cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_yn10_1900_40_007527691/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_yn10_1900_40_007527691/dataout/ocean_restart.day after 11 attempts Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6880, iMonCtr=1 Model crash detected, will try to restart... cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_yn10_1900_40_007527691/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_yn10_1900_40_007527691/dataout/ocean_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_yn10_1900_40_007527691/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_yn10_1900_40_007527691/dataout/ocean_restart.day after 11 attempts Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1376, iMonCtr=1 Model crash detected, will try to restart... cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_yn10_1900_40_007527691/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_yn10_1900_40_007527691/dataout/ocean_restart.day after 11 attempts Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1376, iMonCtr=1 Model crash detected, will try to restart... cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_yn10_1900_40_007527691/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_yn10_1900_40_007527691/dataout/ocean_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_yn10_1900_40_007527691/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_yn10_1900_40_007527691/dataout/ocean_restart.day after 11 attempts Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5504, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
15 Nov 2011 19:21:40	1038991	13569524	hadcm3n_yn10_1900_40_007527691_3	129,600	286,453	2.2103
09 Nov 2011 22:19:16	1038991	13569524	hadcm3n_yn10_1900_40_007527691_3	103,680	229,881	2.2172
07 Nov 2011 07:05:55	1038991	13569524	hadcm3n_yn10_1900_40_007527691_3	77,760	173,453	2.2306
05 Nov 2011 02:19:35	1038991	13569524	hadcm3n_yn10_1900_40_007527691_3	51,840	115,537	2.2287
02 Nov 2011 19:15:51	1038991	13569524	hadcm3n_yn10_1900_40_007527691_3	25,920	58,124	2.2424