Name | hadcm3n_o2wi_2100_40_008203986_4 |
Workunit | 8359110 |
Created | 15 Nov 2012, 0:10:11 UTC |
Sent | 15 Nov 2012, 0:10:19 UTC |
Report deadline | 14 Feb 2013, 7:37:30 UTC |
Received | 1 Dec 2012, 6:23:47 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 22 (0x00000016) Unknown error code |
Computer ID | 1226353 |
Run time | 15 days 5 hours 59 min 52 sec |
CPU time | 6 days 1 hours 58 min 3 sec |
Validate state | Invalid |
Credit | 2,799.36 |
Device peak FLOPS | 1.82 GFLOPS |
Application version | UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86 |
Stderr | <core_client_version>7.0.28</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> 05:24:29 (1188): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 05:24:30 (1188): No heartbeat from core client for 30 sec - exiting 05:24:31 (1188): No heartbeat from core client for 30 sec - exiting 05:24:32 (1188): No heartbeat from core client for 30 sec - exiting 05:24:33 (1188): No heartbeat from core client for 30 sec - exiting 05:24:34 (1188): No heartbeat from core client for 30 sec - exiting 05:24:35 (1188): No heartbeat from core client for 30 sec - exiting 05:24:36 (1188): No heartbeat from core client for 30 sec - exiting 05:24:37 (1188): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 16:16:39 (35184): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 16:16:40 (35184): No heartbeat from core client for 30 sec - exiting 16:16:41 (35184): No heartbeat from core client for 30 sec - exiting 16:16:42 (35184): No heartbeat from core client for 30 sec - exiting 16:16:43 (35184): No heartbeat from core client for 30 sec - exiting 16:16:44 (35184): No heartbeat from core client for 30 sec - exiting 16:16:45 (35184): No heartbeat from core client for 30 sec - exiting 19:35:07 (47760): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:35:08 (47760): No heartbeat from core client for 30 sec - exiting 19:35:09 (47760): No heartbeat from core client for 30 sec - exiting 19:59:15 (41344): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:09:21 (44372): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:12:33 (29584): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:13:25 (29776): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:13:26 (29776): No heartbeat from core client for 30 sec - exiting 23:13:27 (29776): No heartbeat from core client for 30 sec - exiting 23:13:28 (29776): No heartbeat from core client for 30 sec - exiting 23:13:29 (29776): No heartbeat from core client for 30 sec - exiting 23:13:30 (29776): No heartbeat from core client for 30 sec - exiting 23:13:31 (29776): No heartbeat from core client for 30 sec - exiting 23:46:07 (6772): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 00:10:47 (35212): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 01:10:21 (38008): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 01:24:16 (46592): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 01:24:17 (46592): No heartbeat from core client for 30 sec - exiting 01:24:18 (46592): No heartbeat from core client for 30 sec - exiting 01:55:07 (49120): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 03:16:36 (11976): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2924, iMonCtr=1 Model crash detected, will try to restart... 23:18:45 (11080): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:18:47 (11080): No heartbeat from core client for 30 sec - exiting BUFFOUT: C I/O Error - Return code = 32 Model crashed: WRITDUMP: BAD BUFFOUT OF DATA tmp/pipe_dummy 2048 BUFFOUT: C I/O Error - Return code = 32 Model crashed: WRITDUMP: BAD BUFFOUT OF DATA tmp/pipe_dummy 2048 11:40:24 (7808): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=26476, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=16376, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 00:54:55 (16020): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 04:59:48 (18104): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 04:59:49 (18104): No heartbeat from core client for 30 sec - exiting 04:59:50 (18104): No heartbeat from core client for 30 sec - exiting 04:59:51 (18104): No heartbeat from core client for 30 sec - exiting 07:04:08 (16412): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:15:06 (14628): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:22:22 (19184): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 12:04:53 (15592): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 12:10:07 (18452): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 12:12:47 (12184): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 12:47:07 (17592): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 05:40:47 (13944): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:07:53 (7264): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 22:59:28 (47556): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... BUFFOUT: C I/O Error - Return code = 32 Model crashed: WRITDUMP: BAD BUFFOUT OF DATA tmp/pipe_dummy 2048 BUFFOUT: C I/O Error - Return code = 32 Model crashed: WRITDUMP: BAD BUFFOUT OF DATA tmp/pipe_dummy 2048 BUFFOUT: C I/O Error - Return code = 32 Model crashed: WRITHEAD: I/O error Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7360, iMonCtr=1 Model crash detected, will try to restart... BUFFOUT: C I/O Error - Return code = 32 Model crashed: WRITDUMP: BAD BUFFOUT OF DATA tmp/pipe_dummy 2048 BUFFOUT: C I/O Error - Return code = 32 Model crashed: WRITHEAD: I/O error tmp/pipe_dummy 2048 </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
30 Nov 2012 12:38:37 | 1226353 | 15435734 | hadcm3n_o2wi_2100_40_008203986_4 | 233,280 | 538,072 | 2.3066 |
29 Nov 2012 04:46:42 | 1226353 | 15435734 | hadcm3n_o2wi_2100_40_008203986_4 | 207,360 | 477,193 | 2.3013 |
27 Nov 2012 20:02:08 | 1226353 | 15435734 | hadcm3n_o2wi_2100_40_008203986_4 | 181,440 | 417,466 | 2.3008 |
25 Nov 2012 08:15:20 | 1226353 | 15435734 | hadcm3n_o2wi_2100_40_008203986_4 | 155,520 | 357,169 | 2.2966 |
24 Nov 2012 07:43:13 | 1226353 | 15435734 | hadcm3n_o2wi_2100_40_008203986_4 | 129,600 | 301,277 | 2.3247 |
22 Nov 2012 21:27:40 | 1226353 | 15435734 | hadcm3n_o2wi_2100_40_008203986_4 | 103,680 | 239,262 | 2.3077 |
21 Nov 2012 02:42:50 | 1226353 | 15435734 | hadcm3n_o2wi_2100_40_008203986_4 | 77,760 | 177,744 | 2.2858 |
19 Nov 2012 00:22:35 | 1226353 | 15435734 | hadcm3n_o2wi_2100_40_008203986_4 | 51,840 | 122,430 | 2.3617 |
17 Nov 2012 02:39:37 | 1226353 | 15435734 | hadcm3n_o2wi_2100_40_008203986_4 | 25,920 | 61,054 | 2.3555 |
©2024 cpdn.org