Name | hadcm3n_oabd_1900_40_008468492_3 |
Workunit | 8619331 |
Created | 22 Feb 2014, 16:16:55 UTC |
Sent | 22 Feb 2014, 16:16:58 UTC |
Report deadline | 24 May 2014, 23:44:09 UTC |
Received | 2 Mar 2014, 20:45:18 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 22 (0x00000016) Unknown error code |
Computer ID | 1314942 |
Run time | 7 days 21 hours 42 min 15 sec |
CPU time | 3 days 17 hours 38 min 52 sec |
Validate state | Invalid |
Credit | 4,354.56 |
Device peak FLOPS | 2.81 GFLOPS |
Application version | UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86 |
Stderr | <core_client_version>7.2.39</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> 20:22:31 (14952): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:22:32 (14952): No heartbeat from core client for 30 sec - exiting 20:22:33 (14952): No heartbeat from core client for 30 sec - exiting 20:22:34 (14952): No heartbeat from core client for 30 sec - exiting 20:22:35 (14952): No heartbeat from core client for 30 sec - exiting 21:56:13 (12612): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:56:15 (12612): No heartbeat from core client for 30 sec - exiting 16:27:29 (15908): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 16:27:32 (15908): No heartbeat from core client for 30 sec - exiting 16:27:33 (15908): No heartbeat from core client for 30 sec - exiting 16:27:34 (15908): No heartbeat from core client for 30 sec - exiting 16:27:35 (15908): No heartbeat from core client for 30 sec - exiting 16:27:36 (15908): No heartbeat from core client for 30 sec - exiting 06:56:56 (33848): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 06:56:59 (33848): No heartbeat from core client for 30 sec - exiting 06:57:00 (33848): No heartbeat from core client for 30 sec - exiting 06:57:01 (33848): No heartbeat from core client for 30 sec - exiting 06:57:02 (33848): No heartbeat from core client for 30 sec - exiting 06:57:03 (33848): No heartbeat from core client for 30 sec - exiting 06:57:04 (33848): No heartbeat from core client for 30 sec - exiting 06:57:05 (33848): No heartbeat from core client for 30 sec - exiting 06:57:06 (33848): No heartbeat from core client for 30 sec - exiting 06:57:07 (33848): No heartbeat from core client for 30 sec - exiting Model crashed: TEMPHIST: Failed in OPEN of history file tmp/pipe_dummy 2048 Model crashed: TEMPHIST: Failed in OPEN of history file tmp/pipe_dummy 2048 Model crashed: TEMPHIST: Failed in OPEN of history file tmp/pipe_dummy 2048 19:59:07 (2012): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:59:09 (2012): No heartbeat from core client for 30 sec - exiting 19:59:10 (2012): No heartbeat from core client for 30 sec - exiting 19:59:11 (2012): No heartbeat from core client for 30 sec - exiting Model crashed: TEMPHIST: Failed in OPEN of history file tmp/pipe_dummy 2048 Model crashed: TEMPHIST: Failed in OPEN of history file tmp/pipe_dummy 2048 Model crashed: TEMPHIST: Failed in OPEN of history file tmp/pipe_dummy 2048 Model crashed: TEMPHIST: Failed in OPEN of history file tmp/pipe_dummy 2048 11:13:05 (2784): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:13:07 (2784): No heartbeat from core client for 30 sec - exiting 11:13:08 (2784): No heartbeat from core client for 30 sec - exiting 11:14:31 (1904): No heartbeat from core client for 30 sec - exiting 11:14:32 (1904): No heartbeat from core client for 30 sec - exiting 11:14:33 (1904): No heartbeat from core client for 30 sec - exiting 11:14:35 (1904): No heartbeat from core client for 30 sec - exiting 11:14:36 (1904): No heartbeat from core client for 30 sec - exiting 11:14:37 (1904): No heartbeat from core client for 30 sec - exiting 11:14:38 (1904): No heartbeat from core client for 30 sec - exiting 11:14:39 (1904): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Model crashed: TEMPHIST: Failed in OPEN of history file tmp/pipe_dummy 2048 Model crashed: TEMPHIST: Failed in OPEN of history file tmp/pipe_dummy 2048 Model crashed: TEMPHIST: Failed in OPEN of history file tmp/pipe_dummy 2048 02:02:15 (3580): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 03:03:11 (40436): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Atmos Hold Restart file rename failed on atmos_restart.hold 04:32:00 (12980): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 06:32:06 (12360): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 06:32:08 (12360): No heartbeat from core client for 30 sec - exiting Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5848, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5848, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5848, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5848, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5848, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5848, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
02 Mar 2014 13:48:33 | 1314942 | 16292723 | hadcm3n_oabd_1900_40_008468492_3 | 362,880 | 321,256 | 0.8853 |
02 Mar 2014 00:31:57 | 1314942 | 16292723 | hadcm3n_oabd_1900_40_008468492_3 | 336,960 | 413,561 | 1.2273 |
01 Mar 2014 11:24:35 | 1314942 | 16292723 | hadcm3n_oabd_1900_40_008468492_3 | 311,040 | 369,077 | 1.1866 |
28 Feb 2014 21:12:34 | 1314942 | 16292723 | hadcm3n_oabd_1900_40_008468492_3 | 285,120 | 322,177 | 1.1300 |
28 Feb 2014 07:26:04 | 1314942 | 16292723 | hadcm3n_oabd_1900_40_008468492_3 | 259,200 | 319,128 | 1.2312 |
27 Feb 2014 18:12:34 | 1314942 | 16292723 | hadcm3n_oabd_1900_40_008468492_3 | 233,280 | 288,472 | 1.2366 |
27 Feb 2014 03:34:16 | 1314942 | 16292723 | hadcm3n_oabd_1900_40_008468492_3 | 207,360 | 288,250 | 1.3901 |
26 Feb 2014 13:15:30 | 1314942 | 16292723 | hadcm3n_oabd_1900_40_008468492_3 | 181,440 | 295,264 | 1.6273 |
26 Feb 2014 00:38:03 | 1314942 | 16292723 | hadcm3n_oabd_1900_40_008468492_3 | 155,520 | 270,423 | 1.7388 |
25 Feb 2014 10:34:33 | 1314942 | 16292723 | hadcm3n_oabd_1900_40_008468492_3 | 129,600 | 224,420 | 1.7316 |
24 Feb 2014 21:07:49 | 1314942 | 16292723 | hadcm3n_oabd_1900_40_008468492_3 | 103,680 | 178,396 | 1.7206 |
24 Feb 2014 08:08:53 | 1314942 | 16292723 | hadcm3n_oabd_1900_40_008468492_3 | 77,760 | 134,157 | 1.7253 |
23 Feb 2014 18:31:39 | 1314942 | 16292723 | hadcm3n_oabd_1900_40_008468492_3 | 51,840 | 88,001 | 1.6976 |
23 Feb 2014 04:40:23 | 1314942 | 16292723 | hadcm3n_oabd_1900_40_008468492_3 | 25,920 | 41,626 | 1.6059 |
©2024 cpdn.org