Name | hadcm3n_393b_2020_40_008406813_0 |
Workunit | 8557669 |
Created | 20 Aug 2013, 8:16:26 UTC |
Sent | 20 Aug 2013, 8:20:52 UTC |
Report deadline | 19 Nov 2013, 15:48:03 UTC |
Received | 29 Aug 2013, 4:34:58 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 22 (0x00000016) Unknown error code |
Computer ID | 1276247 |
Run time | 3 days 22 hours 35 min 39 sec |
CPU time | 2 days 15 hours 58 min 37 sec |
Validate state | Invalid |
Credit | 1,555.20 |
Device peak FLOPS | 3.02 GFLOPS |
Application version | UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86 |
Stderr | <core_client_version>7.0.64</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... 03:13:57 (7332): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 03:13:58 (7332): No heartbeat from core client for 30 sec - exiting 03:13:59 (7332): No heartbeat from core client for 30 sec - exiting 03:14:00 (7332): No heartbeat from core client for 30 sec - exiting 03:14:01 (7332): No heartbeat from core client for 30 sec - exiting 03:14:02 (7332): No heartbeat from core client for 30 sec - exiting 03:14:03 (7332): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 05:52:21 (13284): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 05:52:24 (13284): No heartbeat from core client for 30 sec - exiting 05:52:25 (13284): No heartbeat from core client for 30 sec - exiting 05:52:26 (13284): No heartbeat from core client for 30 sec - exiting 05:52:27 (13284): No heartbeat from core client for 30 sec - exiting 05:52:28 (13284): No heartbeat from core client for 30 sec - exiting 05:52:29 (13284): No heartbeat from core client for 30 sec - exiting 05:52:30 (13284): No heartbeat from core client for 30 sec - exiting 05:52:31 (13284): No heartbeat from core client for 30 sec - exiting 05:52:32 (13284): No heartbeat from core client for 30 sec - exiting 05:52:33 (13284): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - Suspend request from BOINC... 22:12:42 (26532): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:12:44 (26532): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 19:11:55 (3140): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 21:55:21 (2488): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 01:37:30 (9788): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:30:50 (6040): No heartbeat from core client for 30 sec - exiting 23:30:52 (6040): No heartbeat from core client for 30 sec - exiting 23:30:53 (6040): No heartbeat from core client for 30 sec - exiting 23:30:54 (6040): No heartbeat from core client for 30 sec - exiting 23:30:55 (6040): No heartbeat from core client for 30 sec - exiting 23:30:56 (6040): No heartbeat from core client for 30 sec - exiting 23:30:57 (6040): No heartbeat from core client for 30 sec - exiting 23:30:58 (6040): No heartbeat from core client for 30 sec - exiting 23:30:59 (6040): No heartbeat from core client for 30 sec - exiting 23:31:00 (6040): No heartbeat from core client for 30 sec - exiting 23:31:01 (6040): No heartbeat from core client for 30 sec - exiting 23:31:02 (6040): No heartbeat from core client for 30 sec - exiting 23:31:03 (6040): No heartbeat from core client for 30 sec - exiting 23:31:05 (6040): No heartbeat from core client for 30 sec - exiting 23:31:06 (6040): No heartbeat from core client for 30 sec - exiting 23:31:07 (6040): No heartbeat from core client for 30 sec - exiting 23:31:08 (6040): No heartbeat from core client for 30 sec - exiting 23:31:09 (6040): No heartbeat from core client for 30 sec - exiting 23:31:10 (6040): No heartbeat from core client for 30 sec - exiting 23:31:11 (6040): No heartbeat from core client for 30 sec - exiting 23:31:12 (6040): No heartbeat from core client for 30 sec - exiting 23:31:13 (6040): No heartbeat from core client for 30 sec - exiting 23:31:14 (6040): No heartbeat from core client for 30 sec - exiting 23:31:15 (6040): No heartbeat from core client for 30 sec - exiting 23:31:16 (6040): No heartbeat from core client for 30 sec - exiting 23:31:18 (6040): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:31:19 (6040): No heartbeat from core client for 30 sec - exiting Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2808, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2808, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2808, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2808, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2808, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2808, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
28 Aug 2013 17:55:46 | 1276247 | 15928367 | hadcm3n_393b_2020_40_008406813_0 | 129,600 | 202,274 | 1.5608 |
27 Aug 2013 01:24:11 | 1276247 | 15928367 | hadcm3n_393b_2020_40_008406813_0 | 103,680 | 159,518 | 1.5386 |
25 Aug 2013 03:17:02 | 1276247 | 15928367 | hadcm3n_393b_2020_40_008406813_0 | 77,760 | 116,585 | 1.4993 |
23 Aug 2013 12:26:49 | 1276247 | 15928367 | hadcm3n_393b_2020_40_008406813_0 | 51,840 | 74,614 | 1.4393 |
22 Aug 2013 15:05:44 | 1276247 | 15928367 | hadcm3n_393b_2020_40_008406813_0 | 25,920 | 40,665 | 1.5689 |
©2024 cpdn.org