Name | hadcm3n_ycqf_1900_40_007349361_2 |
Workunit | 7546791 |
Created | 17 Jul 2011, 19:05:34 UTC |
Sent | 17 Jul 2011, 19:28:47 UTC |
Report deadline | 17 Oct 2011, 2:55:58 UTC |
Received | 6 Nov 2011, 10:06:38 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 22 (0x00000016) Unknown error code |
Computer ID | 936524 |
Run time | 15 days 11 hours 3 min 11 sec |
CPU time | 15 days 3 hours 17 min 34 sec |
Validate state | Invalid |
Credit | 10,575.36 |
Device peak FLOPS | 3.03 GFLOPS |
Application version | UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86 |
Stderr | <core_client_version>6.6.36</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> 14:36:38 (6132): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 14:24:12 (6032): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 09:04:15 (6216): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 09:04:16 (6216): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5016, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... 21:26:36 (6052): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 21:20:04 (9984): No heartbeat from core client for 30 sec - exiting 21:20:05 (9984): No heartbeat from core client for 30 sec - exiting 21:20:06 (9984): No heartbeat from core client for 30 sec - exiting 21:20:07 (9984): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:33:22 (6528): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:35:28 (6520): No heartbeat from core client for 30 sec - exiting 14:35:29 (6520): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 16:32:54 (9296): No heartbeat from core client for 30 sec - exiting 16:32:55 (9296): No heartbeat from core client for 30 sec - exiting 16:32:56 (9296): No heartbeat from core client for 30 sec - exiting 16:32:57 (9296): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 16:32:58 (9296): No heartbeat from core client for 30 sec - exiting 14:08:33 (3140): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 21:42:40 (9072): No heartbeat from core client for 30 sec - exiting 21:42:41 (9072): No heartbeat from core client for 30 sec - exiting 21:42:42 (9072): No heartbeat from core client for 30 sec - exiting 21:42:43 (9072): No heartbeat from core client for 30 sec - exiting 21:42:44 (9072): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... BUFFOUT: C I/O Error - Return code = 32 Model crashed: WRITDUMP: BAD BUFFOUT OF DATA tmp/pipe_dummy 2048 21:49:05 (8012): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:49:06 (8012): No heartbeat from core client for 30 sec - exiting 21:49:07 (8012): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 12:07:18 (9772): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 10:54:41 (5016): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6200, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6200, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6200, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6200, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6200, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6200, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
05 Nov 2011 22:27:47 | 936524 | 13143905 | hadcm3n_ycqf_1900_40_007349361_2 | 881,280 | 1,303,564 | 1.4792 |
04 Nov 2011 08:09:50 | 936524 | 13143905 | hadcm3n_ycqf_1900_40_007349361_2 | 855,360 | 1,265,143 | 1.4791 |
01 Nov 2011 20:26:10 | 936524 | 13143905 | hadcm3n_ycqf_1900_40_007349361_2 | 829,440 | 1,226,754 | 1.4790 |
31 Oct 2011 17:41:46 | 936524 | 13143905 | hadcm3n_ycqf_1900_40_007349361_2 | 803,520 | 1,182,970 | 1.4722 |
31 Oct 2011 17:12:12 | 936524 | 13143905 | hadcm3n_ycqf_1900_40_007349361_2 | 777,600 | 1,142,682 | 1.4695 |
31 Oct 2011 15:11:39 | 936524 | 13143905 | hadcm3n_ycqf_1900_40_007349361_2 | 751,680 | 1,105,415 | 1.4706 |
31 Oct 2011 12:51:24 | 936524 | 13143905 | hadcm3n_ycqf_1900_40_007349361_2 | 725,760 | 1,068,588 | 1.4724 |
31 Oct 2011 12:51:23 | 936524 | 13143905 | hadcm3n_ycqf_1900_40_007349361_2 | 699,840 | 1,032,291 | 1.4750 |
31 Oct 2011 12:51:20 | 936524 | 13143905 | hadcm3n_ycqf_1900_40_007349361_2 | 673,920 | 995,826 | 1.4777 |
16 Oct 2011 15:22:22 | 936524 | 13143905 | hadcm3n_ycqf_1900_40_007349361_2 | 648,000 | 958,566 | 1.4793 |
15 Oct 2011 17:52:22 | 936524 | 13143905 | hadcm3n_ycqf_1900_40_007349361_2 | 622,080 | 921,640 | 1.4815 |
14 Oct 2011 19:08:02 | 936524 | 13143905 | hadcm3n_ycqf_1900_40_007349361_2 | 596,160 | 884,949 | 1.4844 |
12 Oct 2011 11:43:03 | 936524 | 13143905 | hadcm3n_ycqf_1900_40_007349361_2 | 570,240 | 846,741 | 1.4849 |
11 Oct 2011 15:27:16 | 936524 | 13143905 | hadcm3n_ycqf_1900_40_007349361_2 | 544,320 | 809,562 | 1.4873 |
10 Oct 2011 14:31:24 | 936524 | 13143905 | hadcm3n_ycqf_1900_40_007349361_2 | 518,400 | 772,210 | 1.4896 |
05 Oct 2011 15:32:12 | 936524 | 13143905 | hadcm3n_ycqf_1900_40_007349361_2 | 492,480 | 734,079 | 1.4906 |
03 Oct 2011 13:54:16 | 936524 | 13143905 | hadcm3n_ycqf_1900_40_007349361_2 | 466,560 | 696,470 | 1.4928 |
22 Sep 2011 18:04:07 | 936524 | 13143905 | hadcm3n_ycqf_1900_40_007349361_2 | 440,640 | 656,144 | 1.4891 |
07 Sep 2011 16:02:19 | 936524 | 13143905 | hadcm3n_ycqf_1900_40_007349361_2 | 414,720 | 617,283 | 1.4884 |
06 Sep 2011 19:13:12 | 936524 | 13143905 | hadcm3n_ycqf_1900_40_007349361_2 | 388,800 | 578,622 | 1.4882 |
©2024 cpdn.org