Name | hadcm3n_o38o_1940_40_008379782_4 |
Workunit | 8530641 |
Created | 17 Sep 2013, 11:05:26 UTC |
Sent | 17 Sep 2013, 11:21:07 UTC |
Report deadline | 17 Dec 2013, 18:48:18 UTC |
Received | 7 Nov 2013, 17:42:57 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 22 (0x00000016) Unknown error code |
Computer ID | 1237173 |
Run time | 22 hours 18 min 2 sec |
CPU time | 16 hours 35 min 10 sec |
Validate state | Invalid |
Credit | 622.08 |
Device peak FLOPS | 3.34 GFLOPS |
Application version | UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86 |
Stderr | <core_client_version>7.0.64</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> 10:40:54 (59224): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 10:40:55 (59224): No heartbeat from core client for 30 sec - exiting 10:40:56 (59224): No heartbeat from core client for 30 sec - exiting 10:40:57 (59224): No heartbeat from core client for 30 sec - exiting 10:40:58 (59224): No heartbeat from core client for 30 sec - exiting 10:40:59 (59224): No heartbeat from core client for 30 sec - exiting 10:41:00 (59224): No heartbeat from core client for 30 sec - exiting 10:41:01 (59224): No heartbeat from core client for 30 sec - exiting 10:41:02 (59224): No heartbeat from core client for 30 sec - exiting 10:41:03 (59224): No heartbeat from core client for 30 sec - exiting 10:41:04 (59224): No heartbeat from core client for 30 sec - exiting 10:54:19 (68548): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 12:28:02 (74500): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 12:28:04 (74500): No heartbeat from core client for 30 sec - exiting 12:28:05 (74500): No heartbeat from core client for 30 sec - exiting 12:35:17 (72072): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 17:20:03 (9220): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 17:20:06 (9220): No heartbeat from core client for 30 sec - exiting 17:20:07 (9220): No heartbeat from core client for 30 sec - exiting 17:20:08 (9220): No heartbeat from core client for 30 sec - exiting 17:20:09 (9220): No heartbeat from core client for 30 sec - exiting 17:20:10 (9220): No heartbeat from core client for 30 sec - exiting 17:20:11 (9220): No heartbeat from core client for 30 sec - exiting 17:39:39 (9580): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Atmos Hold Restart file rename failed on atmos_restart.hold 18:07:29 (4796): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:27:06 (8480): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:33:34 (10424): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:33:38 (10424): No heartbeat from core client for 30 sec - exiting 20:15:26 (10440): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:15:29 (10440): No heartbeat from core client for 30 sec - exiting Atmos Hold Restart file rename failed on atmos_restart.hold 20:20:09 (12192): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 21:19:57 (27716): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 00:45:50 (24048): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 01:51:20 (26732): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 01:51:27 (26732): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... 13:02:58 (36796): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 13:54:03 (34888): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 13:54:08 (34888): No heartbeat from core client for 30 sec - exiting 13:54:09 (34888): No heartbeat from core client for 30 sec - exiting 13:54:10 (34888): No heartbeat from core client for 30 sec - exiting 13:54:11 (34888): No heartbeat from core client for 30 sec - exiting 13:54:12 (34888): No heartbeat from core client for 30 sec - exiting 13:54:13 (34888): No heartbeat from core client for 30 sec - exiting 13:54:14 (34888): No heartbeat from core client for 30 sec - exiting 13:54:15 (34888): No heartbeat from core client for 30 sec - exiting 13:54:16 (34888): No heartbeat from core client for 30 sec - exiting 13:54:17 (34888): No heartbeat from core client for 30 sec - exiting 14:17:04 (37756): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:17:19 (37756): No heartbeat from core client for 30 sec - exiting 14:17:20 (37756): No heartbeat from core client for 30 sec - exiting 14:17:21 (37756): No heartbeat from core client for 30 sec - exiting 14:17:22 (37756): No heartbeat from core client for 30 sec - exiting 14:17:23 (37756): No heartbeat from core client for 30 sec - exiting 14:17:24 (37756): No heartbeat from core client for 30 sec - exiting 14:17:25 (37756): No heartbeat from core client for 30 sec - exiting 14:17:26 (37756): No heartbeat from core client for 30 sec - exiting 14:17:27 (37756): No heartbeat from core client for 30 sec - exiting 14:17:28 (37756): No heartbeat from core client for 30 sec - exiting 14:29:48 (39468): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:35:06 (26592): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 16:29:37 (10844): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 16:29:43 (10844): No heartbeat from core client for 30 sec - exiting 16:29:44 (10844): No heartbeat from core client for 30 sec - exiting 16:29:45 (10844): No heartbeat from core client for 30 sec - exiting 16:35:57 (33148): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:28:37 (37324): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:09:23 (23684): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:50:35 (10416): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:29:04 (35112): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:36:05 (8064): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:44:34 (25748): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:54:49 (39040): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 00:15:10 (40900): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 00:58:24 (40732): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 01:09:28 (40296): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 02:13:48 (41512): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=41100, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=41100, iMonCtr=1 Model crash detected, will try to restart... 02:22:16 (41100): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=43564, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... 11:15:43 (8116): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:29:09 (9972): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=10088, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=10088, iMonCtr=1 Model crash detected, will try to restart... 11:34:21 (10088): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=13308, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
10 Oct 2013 17:52:37 | 1237173 | 16020915 | hadcm3n_o38o_1940_40_008379782_4 | 51,840 | 58,557 | 1.1296 |
09 Oct 2013 01:09:11 | 1237173 | 16020915 | hadcm3n_o38o_1940_40_008379782_4 | 25,920 | 29,643 | 1.1436 |
©2024 cpdn.org