Task 15696005

Name	hadcm3n_n256_1920_40_008334587_3
Workunit	8485448
Created	30 Mar 2013, 19:26:21 UTC
Sent	30 Mar 2013, 19:26:27 UTC
Report deadline	30 Jun 2013, 2:53:38 UTC
Received	4 Jun 2013, 19:11:29 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1096206
Run time	8 days 18 hours 57 min 2 sec
CPU time	8 days 15 hours 29 min 7 sec
Validate state	Invalid
Credit	6,531.84
Device peak FLOPS	2.66 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.10.58</core_client_version> <![CDATA[ <message> Het apparaat herkent de opdracht niet. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=868, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=868, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=868, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=868, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=868, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=868, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6472, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6472, iMonCtr=1 Model crash detected, will try to restart... 23:33:25 (2592): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:33:26 (2592): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=9504, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=9504, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=9504, iMonCtr=1 Model crash detected, will try to restart... Atmos Hold Restart file rename failed on atmos_restart.hold Atmos Hold Restart file rename failed on atmos_restart.hold Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=12188, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8652, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3144, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6516, iMonCtr=1 Model crash detected, will try to restart... 22:30:41 (9896): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 00:20:21 (7624): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 00:20:24 (7624): No heartbeat from core client for 30 sec - exiting 00:20:25 (7624): No heartbeat from core client for 30 sec - exiting 00:20:26 (7624): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=13916, iMonCtr=1 Model crash detected, will try to restart... 00:31:36 (13916): No heartbeat from core client for 30 sec - exiting 00:31:37 (13916): No heartbeat from core client for 30 sec - exiting 00:31:38 (13916): No heartbeat from core client for 30 sec - exiting 00:31:39 (13916): No heartbeat from core client for 30 sec - exiting 00:31:40 (13916): No heartbeat from core client for 30 sec - exiting 00:31:41 (13916): No heartbeat from core client for 30 sec - exiting 00:31:42 (13916): No heartbeat from core client for 30 sec - exiting 00:31:43 (13916): No heartbeat from core client for 30 sec - exiting 00:31:44 (13916): No heartbeat from core client for 30 sec - exiting 00:31:45 (13916): No heartbeat from core client for 30 sec - exiting 00:31:46 (13916): No heartbeat from core client for 30 sec - exiting 00:31:47 (13916): No heartbeat from core client for 30 sec - exiting 00:31:48 (13916): No heartbeat from core client for 30 sec - exiting 00:31:49 (13916): No heartbeat from core client for 30 sec - exiting 00:31:50 (13916): No heartbeat from core client for 30 sec - exiting 00:31:51 (13916): No heartbeat from core client for 30 sec - exiting 00:31:52 (13916): No heartbeat from core client for 30 sec - exiting 00:31:53 (13916): No heartbeat from core client for 30 sec - exiting 00:31:54 (13916): No heartbeat from core client for 30 sec - exiting 00:31:55 (13916): No heartbeat from core client for 30 sec - exiting 00:31:56 (13916): No heartbeat from core client for 30 sec - exiting 00:31:57 (13916): No heartbeat from core client for 30 sec - exiting 00:31:58 (13916): No heartbeat from core client for 30 sec - exiting 00:31:59 (13916): No heartbeat from core client for 30 sec - exiting 00:32:00 (13916): No heartbeat from core client for 30 sec - exiting 00:32:01 (13916): No heartbeat from core client for 30 sec - exiting 00:32:02 (13916): No heartbeat from core client for 30 sec - exiting 00:32:03 (13916): No heartbeat from core client for 30 sec - exiting 00:32:04 (13916): No heartbeat from core client for 30 sec - exiting 00:32:05 (13916): No heartbeat from core client for 30 sec - exiting 00:32:06 (13916): No heartbeat from core client for 30 sec - exiting 00:32:07 (13916): No heartbeat from core client for 30 sec - exiting 00:32:08 (13916): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 00:32:09 (13916): No heartbeat from core client for 30 sec - exiting 00:32:10 (13916): No heartbeat from core client for 30 sec - exiting 15:15:57 (13700): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:24:07 (1916): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:24:10 (1916): No heartbeat from core client for 30 sec - exiting 18:24:11 (1916): No heartbeat from core client for 30 sec - exiting 18:26:07 (5132): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:58:42 (5948): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:01:09 (6216): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:01:11 (6216): No heartbeat from core client for 30 sec - exiting 21:01:12 (6216): No heartbeat from core client for 30 sec - exiting 21:11:20 (11848): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:11:23 (11848): No heartbeat from core client for 30 sec - exiting 21:11:24 (11848): No heartbeat from core client for 30 sec - exiting 21:11:25 (11848): No heartbeat from core client for 30 sec - exiting 21:13:02 (12612): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:13:03 (12612): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8464, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8464, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8464, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8464, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8464, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8464, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_n256_1920_40_008334587/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_n256_1920_40_008334587/dataout/ocean_restart.day after 11 attempts Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2776, iMonCtr=1 Model crash detected, will try to restart... cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_n256_1920_40_008334587/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_n256_1920_40_008334587/dataout/ocean_restart.day after 11 attempts Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2776, iMonCtr=1 Model crash detected, will try to restart... cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_n256_1920_40_008334587/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_n256_1920_40_008334587/dataout/ocean_restart.day after 11 attempts Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2776, iMonCtr=1 Model crash detected, will try to restart... cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_n256_1920_40_008334587/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_n256_1920_40_008334587/dataout/ocean_restart.day after 11 attempts Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2776, iMonCtr=1 Model crash detected, will try to restart... cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_n256_1920_40_008334587/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_n256_1920_40_008334587/dataout/ocean_restart.day after 11 attempts Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2776, iMonCtr=1 Model crash detected, will try to restart... cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_n256_1920_40_008334587/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_n256_1920_40_008334587/dataout/ocean_restart.day after 11 attempts Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2776, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
01 Jun 2013 22:39:50	1096206	15696005	hadcm3n_n256_1920_40_008334587_3	544,320	745,785	1.3701
01 Jun 2013 11:45:29	1096206	15696005	hadcm3n_n256_1920_40_008334587_3	518,400	707,299	1.3644
27 May 2013 15:56:43	1096206	15696005	hadcm3n_n256_1920_40_008334587_3	492,480	669,543	1.3595
23 May 2013 22:13:55	1096206	15696005	hadcm3n_n256_1920_40_008334587_3	466,560	630,709	1.3518
19 May 2013 13:53:53	1096206	15696005	hadcm3n_n256_1920_40_008334587_3	440,640	592,562	1.3448
18 May 2013 16:32:00	1096206	15696005	hadcm3n_n256_1920_40_008334587_3	414,720	557,217	1.3436
16 May 2013 15:14:43	1096206	15696005	hadcm3n_n256_1920_40_008334587_3	388,800	522,401	1.3436
08 May 2013 19:46:06	1096206	15696005	hadcm3n_n256_1920_40_008334587_3	362,880	488,555	1.3463
02 May 2013 07:55:33	1096206	15696005	hadcm3n_n256_1920_40_008334587_3	336,960	455,408	1.3515
01 May 2013 23:02:36	1096206	15696005	hadcm3n_n256_1920_40_008334587_3	311,040	423,515	1.3616
28 Apr 2013 15:55:58	1096206	15696005	hadcm3n_n256_1920_40_008334587_3	285,120	389,396	1.3657
28 Apr 2013 04:49:50	1096206	15696005	hadcm3n_n256_1920_40_008334587_3	259,200	355,580	1.3718
27 Apr 2013 19:34:54	1096206	15696005	hadcm3n_n256_1920_40_008334587_3	233,280	322,572	1.3828
25 Apr 2013 20:27:44	1096206	15696005	hadcm3n_n256_1920_40_008334587_3	207,360	287,554	1.3867
20 Apr 2013 20:47:47	1096206	15696005	hadcm3n_n256_1920_40_008334587_3	181,440	253,556	1.3975
16 Apr 2013 19:40:46	1096206	15696005	hadcm3n_n256_1920_40_008334587_3	155,520	219,596	1.4120
15 Apr 2013 22:48:14	1096206	15696005	hadcm3n_n256_1920_40_008334587_3	129,600	187,690	1.4482
13 Apr 2013 18:05:09	1096206	15696005	hadcm3n_n256_1920_40_008334587_3	103,680	154,359	1.4888
10 Apr 2013 21:30:08	1096206	15696005	hadcm3n_n256_1920_40_008334587_3	77,760	119,278	1.5339
06 Apr 2013 14:55:33	1096206	15696005	hadcm3n_n256_1920_40_008334587_3	51,840	82,063	1.5830