Name | hadcm3n_t37a_1940_40_007446061_3 |
Workunit | 7643564 |
Created | 17 Sep 2011, 14:51:13 UTC |
Sent | 17 Sep 2011, 15:32:48 UTC |
Report deadline | 17 Dec 2011, 22:59:59 UTC |
Received | 9 Oct 2011, 5:19:45 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 22 (0x00000016) Unknown error code |
Computer ID | 977091 |
Run time | 20 days 18 hours 45 min 16 sec |
CPU time | 20 days 12 hours 41 min 32 sec |
Validate state | Invalid |
Credit | 10,264.32 |
Device peak FLOPS | 2.52 GFLOPS |
Application version | UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86 |
Stderr | <core_client_version>6.12.26</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 15:12:17 (2736): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:46:07 (6244): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:47:52 (3740): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6176, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6176, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... 13:52:08 (5348): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 17:14:33 (7056): No heartbeat from core client for 30 sec - exiting 17:14:34 (7056): No heartbeat from core client for 30 sec - exiting 17:14:35 (7056): No heartbeat from core client for 30 sec - exiting 17:14:36 (7056): No heartbeat from core client for 30 sec - exiting 17:14:37 (7056): No heartbeat from core client for 30 sec - exiting 17:14:38 (7056): No heartbeat from core client for 30 sec - exiting 17:14:39 (7056): No heartbeat from core client for 30 sec - exiting 17:14:40 (7056): No heartbeat from core client for 30 sec - exiting 17:14:41 (7056): No heartbeat from core client for 30 sec - exiting 17:14:43 (7056): No heartbeat from core client for 30 sec - exiting 17:14:44 (7056): No heartbeat from core client for 30 sec - exiting 17:14:45 (7056): No heartbeat from core client for 30 sec - exiting 17:14:46 (7056): No heartbeat from core client for 30 sec - exiting 17:14:47 (7056): No heartbeat from core client for 30 sec - exiting 17:14:48 (7056): No heartbeat from core client for 30 sec - exiting 17:14:49 (7056): No heartbeat from core client for 30 sec - exiting 17:14:50 (7056): No heartbeat from core client for 30 sec - exiting 17:14:51 (7056): No heartbeat from core client for 30 sec - exiting 17:14:52 (7056): No heartbeat from core client for 30 sec - exiting 17:14:53 (7056): No heartbeat from core client for 30 sec - exiting 17:14:55 (7056): No heartbeat from core client for 30 sec - exiting 17:14:56 (7056): No heartbeat from core client for 30 sec - exiting 17:14:57 (7056): No heartbeat from core client for 30 sec - exiting 17:14:58 (7056): No heartbeat from core client for 30 sec - exiting 17:14:59 (7056): No heartbeat from core client for 30 sec - exiting 17:15:00 (7056): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 14:10:50 (5264): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 16:16:39 (4772): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5724, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5724, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5724, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5724, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5724, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5724, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
08 Oct 2011 10:41:14 | 977091 | 13394141 | hadcm3n_t37a_1940_40_007446061_3 | 855,360 | 1,733,077 | 2.0261 |
07 Oct 2011 21:11:37 | 977091 | 13394141 | hadcm3n_t37a_1940_40_007446061_3 | 829,440 | 1,683,090 | 2.0292 |
07 Oct 2011 05:57:35 | 977091 | 13394141 | hadcm3n_t37a_1940_40_007446061_3 | 803,520 | 1,631,375 | 2.0303 |
06 Oct 2011 15:46:53 | 977091 | 13394141 | hadcm3n_t37a_1940_40_007446061_3 | 777,600 | 1,578,763 | 2.0303 |
06 Oct 2011 00:25:11 | 977091 | 13394141 | hadcm3n_t37a_1940_40_007446061_3 | 751,680 | 1,525,999 | 2.0301 |
05 Oct 2011 09:25:40 | 977091 | 13394141 | hadcm3n_t37a_1940_40_007446061_3 | 725,760 | 1,473,149 | 2.0298 |
04 Oct 2011 17:58:32 | 977091 | 13394141 | hadcm3n_t37a_1940_40_007446061_3 | 699,840 | 1,420,291 | 2.0295 |
04 Oct 2011 02:20:32 | 977091 | 13394141 | hadcm3n_t37a_1940_40_007446061_3 | 673,920 | 1,367,351 | 2.0290 |
03 Oct 2011 11:30:09 | 977091 | 13394141 | hadcm3n_t37a_1940_40_007446061_3 | 648,000 | 1,314,347 | 2.0283 |
02 Oct 2011 20:35:53 | 977091 | 13394141 | hadcm3n_t37a_1940_40_007446061_3 | 622,080 | 1,261,432 | 2.0278 |
02 Oct 2011 05:48:22 | 977091 | 13394141 | hadcm3n_t37a_1940_40_007446061_3 | 596,160 | 1,208,378 | 2.0269 |
01 Oct 2011 14:54:02 | 977091 | 13394141 | hadcm3n_t37a_1940_40_007446061_3 | 570,240 | 1,155,203 | 2.0258 |
01 Oct 2011 00:05:01 | 977091 | 13394141 | hadcm3n_t37a_1940_40_007446061_3 | 544,320 | 1,102,294 | 2.0251 |
30 Sep 2011 09:11:38 | 977091 | 13394141 | hadcm3n_t37a_1940_40_007446061_3 | 518,400 | 1,049,362 | 2.0242 |
29 Sep 2011 18:27:16 | 977091 | 13394141 | hadcm3n_t37a_1940_40_007446061_3 | 492,480 | 996,681 | 2.0238 |
29 Sep 2011 03:43:56 | 977091 | 13394141 | hadcm3n_t37a_1940_40_007446061_3 | 466,560 | 943,863 | 2.0230 |
28 Sep 2011 12:53:09 | 977091 | 13394141 | hadcm3n_t37a_1940_40_007446061_3 | 440,640 | 891,063 | 2.0222 |
27 Sep 2011 22:11:24 | 977091 | 13394141 | hadcm3n_t37a_1940_40_007446061_3 | 414,720 | 838,659 | 2.0222 |
27 Sep 2011 07:27:14 | 977091 | 13394141 | hadcm3n_t37a_1940_40_007446061_3 | 388,800 | 786,105 | 2.0219 |
26 Sep 2011 16:35:03 | 977091 | 13394141 | hadcm3n_t37a_1940_40_007446061_3 | 362,880 | 733,276 | 2.0207 |
©2024 cpdn.org