Task 15988184

Name	hadcm3n_n7i8_1920_40_008377982_3
Workunit	8528841
Created	30 Aug 2013, 17:49:04 UTC
Sent	30 Aug 2013, 18:29:34 UTC
Report deadline	30 Nov 2013, 1:56:45 UTC
Received	30 Sep 2013, 16:17:34 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1169946
Run time	5 days 14 hours 24 min 7 sec
CPU time	5 days 2 hours 49 min 11 sec
Validate state	Invalid
Credit	4,043.52
Device peak FLOPS	3.28 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.12.34</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6240, iMonCtr=1 Model crash detected, will try to restart... 22:15:46 (3988): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6096, iMonCtr=1 Model crash detected, will try to restart... 22:15:29 (4696): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:54:45 (6988): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1868, iMonCtr=1 Model crash detected, will try to restart... 10:02:52 (4288): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 13:07:28 (5956): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:43:58 (6272): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 21:38:46 (5148): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:21:43 (5456): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 13:45:57 (5676): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:01:27 (6048): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:02:50 (3448): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2428, iMonCtr=1 Model crash detected, will try to restart... 21:55:05 (5268): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 09:10:10 (6024): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 16:27:46 (5204): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 10:42:49 (5568): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 10:13:14 (5380): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:07:23 (576): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:38:00 (2092): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:43:00 (4132): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:07:20 (5152): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:42:45 (5084): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 06:54:02 (4752): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:30:50 (6936): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 07:57:49 (4188): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:39:47 (5548): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:55:29 (5220): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 17:59:58 (3228): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:57:27 (5792): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5912, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5912, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5912, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4568, iMonCtr=1 Model crash detected, will try to restart... 18:30:27 (6088): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5944, iMonCtr=1 Model crash detected, will try to restart... 22:03:24 (4364): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6840, iMonCtr=1 Model crash detected, will try to restart... 13:52:12 (4012): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:46:05 (3316): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6152, iMonCtr=1 Model crash detected, will try to restart... BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 08:06:16 (5280): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
29 Sep 2013 03:01:50	1169946	15988184	hadcm3n_n7i8_1920_40_008377982_3	336,960	418,941	1.2433
28 Sep 2013 17:07:34	1169946	15988184	hadcm3n_n7i8_1920_40_008377982_3	311,040	385,640	1.2398
25 Sep 2013 09:21:53	1169946	15988184	hadcm3n_n7i8_1920_40_008377982_3	285,120	353,799	1.2409
21 Sep 2013 22:57:21	1169946	15988184	hadcm3n_n7i8_1920_40_008377982_3	259,200	321,968	1.2422
19 Sep 2013 10:52:50	1169946	15988184	hadcm3n_n7i8_1920_40_008377982_3	233,280	289,723	1.2420
17 Sep 2013 17:27:55	1169946	15988184	hadcm3n_n7i8_1920_40_008377982_3	207,360	257,200	1.2404
15 Sep 2013 02:42:22	1169946	15988184	hadcm3n_n7i8_1920_40_008377982_3	181,440	225,093	1.2406
14 Sep 2013 02:50:37	1169946	15988184	hadcm3n_n7i8_1920_40_008377982_3	155,520	192,264	1.2363
12 Sep 2013 03:42:25	1169946	15988184	hadcm3n_n7i8_1920_40_008377982_3	129,600	160,567	1.2389
11 Sep 2013 17:03:19	1169946	15988184	hadcm3n_n7i8_1920_40_008377982_3	103,680	128,275	1.2372
09 Sep 2013 02:41:42	1169946	15988184	hadcm3n_n7i8_1920_40_008377982_3	77,760	97,056	1.2481
08 Sep 2013 16:24:14	1169946	15988184	hadcm3n_n7i8_1920_40_008377982_3	51,840	65,200	1.2577
04 Sep 2013 04:29:48	1169946	15988184	hadcm3n_n7i8_1920_40_008377982_3	25,920	33,860	1.3063