Task 12881145

Name	hadcm3n_p1nq_1900_40_007219734_2
Workunit	7417974
Created	11 May 2011, 20:25:12 UTC
Sent	11 May 2011, 20:25:24 UTC
Report deadline	11 Aug 2011, 3:52:35 UTC
Received	16 Aug 2011, 6:55:36 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	925017
Run time	12 days 17 hours 13 min 26 sec
CPU time	9 days 8 hours 36 min 10 sec
Validate state	Invalid
Credit	5,287.68
Device peak FLOPS	2.75 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.10.18</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3748, iMonCtr=1 Model crash detected, will try to restart... 08:17:35 (3120): No heartbeat from core client for 30 sec - exiting 08:17:36 (3120): No heartbeat from core client for 30 sec - exiting 08:17:37 (3120): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1124, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3812, iMonCtr=1 Model crash detected, will try to restart... BUFFIN: C I/O Error feof - Unit 63 - Return code = 16 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 65 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 Error converting file to netcdf: dataout/p1nqko.pja4c10 Error converting file to netcdf: dataout/p1nqko.pia4c10 Error converting file to netcdf: dataout/p1nqko.pfa4c10 Error converting file to netcdf: dataout/p1nqka.pha4c10 Error converting file to netcdf: dataout/p1nqka.pga4c10 Error converting file to netcdf: dataout/p1nqka.pea4c10 Error converting file to netcdf: dataout/p1nqka.pda4c10 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3352, iMonCtr=1 Model crash detected, will try to restart... 09:51:52 (2928): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 10:03:49 (3484): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:09:30 (3796): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 07:47:40 (3612): No heartbeat from core client for 30 sec - exiting 07:47:41 (3612): No heartbeat from core client for 30 sec - exiting 07:47:42 (3612): No heartbeat from core client for 30 sec - exiting 07:47:43 (3612): No heartbeat from core client for 30 sec - exiting 07:47:45 (3612): No heartbeat from core client for 30 sec - exiting 07:47:46 (3612): No heartbeat from core client for 30 sec - exiting 07:47:47 (3612): No heartbeat from core client for 30 sec - exiting 07:47:48 (3612): No heartbeat from core client for 30 sec - exiting 07:47:49 (3612): No heartbeat from core client for 30 sec - exiting 07:47:50 (3612): No heartbeat from core client for 30 sec - exiting 07:47:51 (3612): No heartbeat from core client for 30 sec - exiting 07:47:52 (3612): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1604, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3932, iMonCtr=1 Model crash detected, will try to restart... 09:09:44 (3708): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3284, iMonCtr=1 Model crash detected, will try to restart... 13:56:52 (4164): No heartbeat from core client for 30 sec - exiting 13:56:53 (4164): No heartbeat from core client for 30 sec - exiting 13:56:54 (4164): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:26:16 (1236): No heartbeat from core client for 30 sec - exiting 20:26:17 (1236): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 07:36:40 (3564): No heartbeat from core client for 30 sec - exiting 07:36:41 (3564): No heartbeat from core client for 30 sec - exiting 07:36:42 (3564): No heartbeat from core client for 30 sec - exiting 07:36:43 (3564): No heartbeat from core client for 30 sec - exiting 07:36:44 (3564): No heartbeat from core client for 30 sec - exiting 07:36:45 (3564): No heartbeat from core client for 30 sec - exiting 07:36:46 (3564): No heartbeat from core client for 30 sec - exiting 07:36:47 (3564): No heartbeat from core client for 30 sec - exiting 07:36:48 (3564): No heartbeat from core client for 30 sec - exiting 07:36:49 (3564): No heartbeat from core client for 30 sec - exiting 07:36:51 (3564): No heartbeat from core client for 30 sec - exiting 07:36:52 (3564): No heartbeat from core client for 30 sec - exiting 07:36:53 (3564): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 07:49:30 (3132): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 17:34:49 (3784): No heartbeat from core client for 30 sec - exiting 17:34:50 (3784): No heartbeat from core client for 30 sec - exiting 17:34:51 (3784): No heartbeat from core client for 30 sec - exiting 17:34:52 (3784): No heartbeat from core client for 30 sec - exiting 17:34:53 (3784): No heartbeat from core client for 30 sec - exiting 17:34:55 (3784): No heartbeat from core client for 30 sec - exiting 17:34:56 (3784): No heartbeat from core client for 30 sec - exiting 17:34:57 (3784): No heartbeat from core client for 30 sec - exiting 17:34:58 (3784): No heartbeat from core client for 30 sec - exiting 17:34:59 (3784): No heartbeat from core client for 30 sec - exiting 17:35:00 (3784): No heartbeat from core client for 30 sec - exiting 17:35:01 (3784): No heartbeat from core client for 30 sec - exiting 17:35:02 (3784): No heartbeat from core client for 30 sec - exiting 17:35:03 (3784): No heartbeat from core client for 30 sec - exiting 17:35:04 (3784): No heartbeat from core client for 30 sec - exiting 17:35:06 (3784): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 17:35:07 (3784): No heartbeat from core client for 30 sec - exiting 12:32:16 (3140): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 07:39:23 (3024): No heartbeat from core client for 30 sec - exiting 07:39:25 (3024): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 10:50:25 (2388): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 07:25:04 (3988): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 16:34:52 (2648): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... C17:38:58 (3544): No heartbeat from core client for 30 sec - exiting 17:38:59 (3544): No heartbeat from core client for 30 sec - exiting 17:39:00 (3544): No heartbeat from core client for 30 sec - exiting 17:39:01 (3544): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 17:39:02 (3544): No heartbeat from core client for 30 sec - exiting 17:39:03 (3544): No heartbeat from core client for 30 sec - exiting 17:39:04 (3544): No heartbeat from core client for 30 sec - exiting 09:05:54 (4064): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:17:01 (2276): No heartbeat from core client for 30 sec - exiting 15:17:02 (2276): No heartbeat from core client for 30 sec - exiting 15:17:03 (2276): No heartbeat from core client for 30 sec - exiting 15:17:04 (2276): No heartbeat from core client for 30 sec - exiting 15:17:05 (2276): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 07:41:50 (3916): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 07:27:36 (3944): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:53:21 (4088): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3332, iMonCtr=1 Model crash detected, will try to restart... 08:42:52 (3712): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:16:15 (2492): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 10:08:32 (416): No heartbeat from core client for 30 sec - exiting 10:08:33 (416): No heartbeat from core client for 30 sec - exiting 10:08:34 (416): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 10:10:28 (2428): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... C07:28:50 (3160): No heartbeat from core client for 30 sec - exiting 07:28:51 (3160): No heartbeat from core client for 30 sec - exiting 07:28:53 (3160): No heartbeat from core client for 30 sec - exiting 07:28:54 (3160): No heartbeat from core client for 30 sec - exiting 07:28:55 (3160): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 07:30:41 (876): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 07:48:23 (3584): No heartbeat from core client for 30 sec - exiting 07:48:24 (3584): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3660, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3660, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3660, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3660, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3660, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1540, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
04 Aug 2011 06:40:37	925017	12881145	hadcm3n_p1nq_1900_40_007219734_2	440,640	791,288	1.7958
25 Jul 2011 17:38:40	925017	12881145	hadcm3n_p1nq_1900_40_007219734_2	414,720	744,055	1.7941
25 Jul 2011 17:38:40	925017	12881145	hadcm3n_p1nq_1900_40_007219734_2	388,800	695,988	1.7901
25 Jul 2011 17:38:40	925017	12881145	hadcm3n_p1nq_1900_40_007219734_2	362,880	648,049	1.7858
07 Jul 2011 17:53:57	925017	12881145	hadcm3n_p1nq_1900_40_007219734_2	336,960	600,209	1.7812
04 Jul 2011 17:30:35	925017	12881145	hadcm3n_p1nq_1900_40_007219734_2	311,040	554,446	1.7826
26 Jun 2011 17:05:34	925017	12881145	hadcm3n_p1nq_1900_40_007219734_2	285,120	509,141	1.7857
21 Jun 2011 07:29:04	925017	12881145	hadcm3n_p1nq_1900_40_007219734_2	259,200	461,421	1.7802
14 Jun 2011 19:47:31	925017	12881145	hadcm3n_p1nq_1900_40_007219734_2	233,280	414,385	1.7763
11 Jun 2011 17:42:05	925017	12881145	hadcm3n_p1nq_1900_40_007219734_2	207,360	368,645	1.7778
09 Jun 2011 20:36:47	925017	12881145	hadcm3n_p1nq_1900_40_007219734_2	181,440	321,653	1.7728
04 Jun 2011 14:52:26	925017	12881145	hadcm3n_p1nq_1900_40_007219734_2	155,520	275,369	1.7706
30 May 2011 10:54:38	925017	12881145	hadcm3n_p1nq_1900_40_007219734_2	129,600	229,406	1.7701
25 May 2011 19:19:45	925017	12881145	hadcm3n_p1nq_1900_40_007219734_2	103,680	183,082	1.7658
20 May 2011 13:07:36	925017	12881145	hadcm3n_p1nq_1900_40_007219734_2	77,760	138,072	1.7756
17 May 2011 15:04:03	925017	12881145	hadcm3n_p1nq_1900_40_007219734_2	51,840	91,754	1.7699
14 May 2011 19:53:54	925017	12881145	hadcm3n_p1nq_1900_40_007219734_2	25,920	45,650	1.7612