Task 11876052

Name	famous_v4wj_1799_200_006692121_5
Workunit	6895374
Created	8 Sep 2010, 9:20:49 UTC
Sent	11 Sep 2010, 13:21:28 UTC
Report deadline	11 Dec 2010, 20:48:39 UTC
Received	28 Jan 2011, 3:51:10 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1050356
Run time	19 days 4 hours 20 min 32 sec
CPU time	18 days 5 hours 51 min 33 sec
Validate state	Invalid
Credit	5,651.43
Device peak FLOPS	1.14 GFLOPS
Application version	UK Met Office FAMOUS v6.11 windows_intelx86
Stderr	<core_client_version>6.10.18</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5148, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4588, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4184, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4508, iMonCtr=1 Model crash detected, will try to restart... BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 60 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 61 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4456, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4304, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2500, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2228, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4656, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1712, iMonCtr=1 Model crash detected, will try to restart... 18:06:35 (3960): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 60 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 61 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2688, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1820, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2056, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5192, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4316, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2468, iMonCtr=1 Model crash detected, will try to restart... CCPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=176, iMonCtr=1 Model crash detected, will try to restart... 08:22:49 (3492): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:22:57 (3492): No heartbeat from core client for 30 sec - exiting 10:41:54 (3160): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5332, iMonCtr=1 Model crash detected, will try to restart... 16:04:09 (4412): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5344, iMonCtr=1 Model crash detected, will try to restart... 17:24:05 (1832): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3892, iMonCtr=1 Model crash detected, will try to restart... 22:01:43 (3700): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 10:03:16 (2624): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 12:30:43 (2668): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 12:30:44 (2668): No heartbeat from core client for 30 sec - exiting 19:27:37 (3464): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:27:45 (3464): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2832, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 60 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 61 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 20:34:08 (2604): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:34:09 (2604): No heartbeat from core client for 30 sec - exiting BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 60 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 61 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 06:20:00 (2908): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... BUFFIN: Read Failed: Result too large BUFFIN: C I/O Error feof - Unit 60 - Return code = 16 BUFFIN: Read Failed: Result too large BUFFIN: C I/O Error feof - Unit 61 - Return code = 16 BUFFIN: Read Failed: Result too large BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: Read Failed: Result too large BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 CPDN Monitor - Quit request from BOINC... 20:43:58 (2476): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1336, iMonCtr=1 Model crash detected, will try to restart... 11:59:00 (2056): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:59:11 (2056): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2492, iMonCtr=1 Model crash detected, will try to restart... 19:23:49 (3816): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3412, iMonCtr=1 Model crash detected, will try to restart... 08:52:08 (1020): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 10:08:28 (5624): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1416, iMonCtr=1 Model crash detected, will try to restart... BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 60 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 61 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1012, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2824, iMonCtr=1 Model crash detected, will try to restart... 08:27:10 (3160): No heartbeat from core client for 30 sec - exiting 08:27:22 (3160): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:27:34 (3160): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=952, iMonCtr=1 Model crash detected, will try to restart... 07:55:34 (4264): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:41:49 (2412): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4852, iMonCtr=1 Model crash detected, will try to restart... 19:48:21 (5896): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... C10:32:17 (2968): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3428, iMonCtr=1 Model crash detected, will try to restart... 09:34:44 (4084): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=296, iMonCtr=1 Model crash detected, will try to restart... 09:21:16 (3172): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 08:51:46 (1408): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3796, iMonCtr=1 Model crash detected, will try to restart... 14:42:32 (1284): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:42:47 (1284): No heartbeat from core client for 30 sec - exiting 09:15:21 (2704): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 09:15:32 (2704): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4880, iMonCtr=1 Model crash detected, will try to restart... 09:27:58 (4512): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4960, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4092, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1120, iMonCtr=1 Model crash detected, will try to restart... 08:00:52 (2808): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:00:59 (2808): No heartbeat from core client for 30 sec - exiting 08:18:17 (2172): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:22:36 (2172): No heartbeat from core client for 30 sec - exiting Model crashed: READHIST: End of file in READ from history file for namelist NLCHISTO tmp/pipe_dummy Model crashed: READHIST: End of file in READ from history file for namelist NLCHISTO tmp/pipe_dummy Model crashed: READHIST: End of file in READ from history file for namelist NLCHISTO tmp/pipe_dummy Model crashed: READHIST: End of file in READ from history file for namelist NLCHISTO tmp/pipe_dummy Model crashed: READHIST: End of file in READ from history file for namelist NLCHISTO tmp/pipe_dummy Model crashed: READHIST: End of file in READ from history file for namelist NLCHISTO tmp/pipe_dummy Sorry, too many model crashes! :-( 08:34:52 (480): called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
28 Jan 2011 03:54:38	1050356	11876052	famous_v4wj_1799_200_006692121_5	1,712,906	1,574,414	0.9191
28 Jan 2011 03:54:38	1050356	11876052	famous_v4wj_1799_200_006692121_5	1,703,546	1,565,469	0.9189
28 Jan 2011 03:54:38	1050356	11876052	famous_v4wj_1799_200_006692121_5	1,694,186	1,556,644	0.9188
28 Jan 2011 03:54:38	1050356	11876052	famous_v4wj_1799_200_006692121_5	1,684,826	1,547,874	0.9187
28 Jan 2011 03:54:38	1050356	11876052	famous_v4wj_1799_200_006692121_5	1,675,466	1,539,098	0.9186
28 Jan 2011 03:54:38	1050356	11876052	famous_v4wj_1799_200_006692121_5	1,666,106	1,530,340	0.9185
28 Jan 2011 03:54:38	1050356	11876052	famous_v4wj_1799_200_006692121_5	1,656,746	1,521,903	0.9186
28 Jan 2011 03:54:38	1050356	11876052	famous_v4wj_1799_200_006692121_5	1,647,386	1,513,308	0.9186
28 Jan 2011 03:54:38	1050356	11876052	famous_v4wj_1799_200_006692121_5	1,638,026	1,504,457	0.9185
22 Jan 2011 16:10:09	1050356	11876052	famous_v4wj_1799_200_006692121_5	1,628,666	1,495,639	0.9183
22 Jan 2011 13:34:41	1050356	11876052	famous_v4wj_1799_200_006692121_5	1,619,306	1,487,025	0.9183
22 Jan 2011 12:01:51	1050356	11876052	famous_v4wj_1799_200_006692121_5	1,609,946	1,478,392	0.9183
22 Jan 2011 12:01:51	1050356	11876052	famous_v4wj_1799_200_006692121_5	1,600,586	1,469,528	0.9181
22 Jan 2011 05:51:16	1050356	11876052	famous_v4wj_1799_200_006692121_5	1,591,226	1,460,679	0.9180
21 Jan 2011 15:47:25	1050356	11876052	famous_v4wj_1799_200_006692121_5	1,581,866	1,451,752	0.9177
21 Jan 2011 14:38:36	1050356	11876052	famous_v4wj_1799_200_006692121_5	1,572,506	1,442,833	0.9175
21 Jan 2011 14:38:36	1050356	11876052	famous_v4wj_1799_200_006692121_5	1,563,146	1,434,049	0.9174
21 Jan 2011 14:38:36	1050356	11876052	famous_v4wj_1799_200_006692121_5	1,553,786	1,425,401	0.9174
21 Jan 2011 14:38:36	1050356	11876052	famous_v4wj_1799_200_006692121_5	1,544,426	1,416,729	0.9173
21 Jan 2011 14:38:36	1050356	11876052	famous_v4wj_1799_200_006692121_5	1,535,066	1,408,023	0.9172