Name | hadcm3n_t3g9_1940_40_007315160_0 |
Workunit | 7512590 |
Created | 28 Jun 2011, 19:57:22 UTC |
Sent | 28 Jun 2011, 20:01:06 UTC |
Report deadline | 28 Sep 2011, 3:28:17 UTC |
Received | 29 Aug 2011, 20:16:57 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 193 (0x000000C1) EXIT_SIGNAL |
Computer ID | 1115875 |
Run time | 12 days 3 hours 46 min 52 sec |
CPU time | 10 days 19 hours 10 min 59 sec |
Validate state | Invalid |
Credit | 6,220.80 |
Device peak FLOPS | 2.44 GFLOPS |
Application version | UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86 |
Stderr | <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> - exit code 193 (0xc1) </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5196, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4124, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3288, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6728, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4296, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4320, iMonCtr=1 Model crash detected, will try to restart... 05:48:24 (5912): No heartbeat from core client for 30 sec - exiting 05:48:25 (5912): No heartbeat from core client for 30 sec - exiting 05:48:26 (5912): No heartbeat from core client for 30 sec - exiting 05:48:27 (5912): No heartbeat from core client for 30 sec - exiting 05:48:28 (5912): No heartbeat from core client for 30 sec - exiting 05:48:29 (5912): No heartbeat from core client for 30 sec - exiting 05:48:30 (5912): No heartbeat from core client for 30 sec - exiting 05:48:31 (5912): No heartbeat from core client for 30 sec - exiting 05:48:32 (5912): No heartbeat from core client for 30 sec - exiting 05:48:33 (5912): No heartbeat from core client for 30 sec - exiting 05:48:34 (5912): No heartbeat from core client for 30 sec - exiting 05:48:35 (5912): No heartbeat from core client for 30 sec - exiting 05:48:36 (5912): No heartbeat from core client for 30 sec - exiting 05:48:37 (5912): No heartbeat from core client for 30 sec - exiting 05:48:38 (5912): No heartbeat from core client for 30 sec - exiting 05:48:39 (5912): No heartbeat from core client for 30 sec - exiting 05:48:40 (5912): No heartbeat from core client for 30 sec - exiting 05:48:41 (5912): No heartbeat from core client for 30 sec - exiting 05:48:42 (5912): No heartbeat from core client for 30 sec - exiting 05:48:43 (5912): No heartbeat from core client for 30 sec - exiting 05:48:44 (5912): No heartbeat from core client for 30 sec - exiting 05:48:45 (5912): No heartbeat from core client for 30 sec - exiting 05:48:46 (5912): No heartbeat from core client for 30 sec - exiting 05:48:47 (5912): No heartbeat from core client for 30 sec - exiting 05:48:48 (5912): No heartbeat from core client for 30 sec - exiting 05:48:49 (5912): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2504, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7116, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6032, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5492, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5172, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5164, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4148, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4548, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3728, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3364, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5292, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4588, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4300, iMonCtr=1 Model crash detected, will try to restart... 05:56:31 (5724): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4500, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2008, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5108, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1884, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5852, iMonCtr=1 Model crash detected, will try to restart... CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5144, iMonCtr=1 Model crash detected, will try to restart... CSignal 11 received, exiting... Called boinc_finish </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
29 Aug 2011 20:15:33 | 1115875 | 13023207 | hadcm3n_t3g9_1940_40_007315160_0 | 518,400 | 933,053 | 1.7999 |
28 Aug 2011 09:47:51 | 1115875 | 13023207 | hadcm3n_t3g9_1940_40_007315160_0 | 492,480 | 887,331 | 1.8018 |
26 Aug 2011 17:33:31 | 1115875 | 13023207 | hadcm3n_t3g9_1940_40_007315160_0 | 466,560 | 841,887 | 1.8045 |
24 Aug 2011 04:52:44 | 1115875 | 13023207 | hadcm3n_t3g9_1940_40_007315160_0 | 440,640 | 796,477 | 1.8075 |
22 Aug 2011 10:17:50 | 1115875 | 13023207 | hadcm3n_t3g9_1940_40_007315160_0 | 414,720 | 751,085 | 1.8111 |
21 Aug 2011 03:48:59 | 1115875 | 13023207 | hadcm3n_t3g9_1940_40_007315160_0 | 388,800 | 705,890 | 1.8156 |
03 Aug 2011 19:11:28 | 1115875 | 13023207 | hadcm3n_t3g9_1940_40_007315160_0 | 362,880 | 660,022 | 1.8188 |
31 Jul 2011 20:37:28 | 1115875 | 13023207 | hadcm3n_t3g9_1940_40_007315160_0 | 336,960 | 614,946 | 1.8250 |
31 Jul 2011 05:33:01 | 1115875 | 13023207 | hadcm3n_t3g9_1940_40_007315160_0 | 311,040 | 569,775 | 1.8318 |
29 Jul 2011 17:20:58 | 1115875 | 13023207 | hadcm3n_t3g9_1940_40_007315160_0 | 285,120 | 523,987 | 1.8378 |
26 Jul 2011 18:49:49 | 1115875 | 13023207 | hadcm3n_t3g9_1940_40_007315160_0 | 259,200 | 479,058 | 1.8482 |
25 Jul 2011 21:46:33 | 1115875 | 13023207 | hadcm3n_t3g9_1940_40_007315160_0 | 233,280 | 433,931 | 1.8601 |
25 Jul 2011 20:30:42 | 1115875 | 13023207 | hadcm3n_t3g9_1940_40_007315160_0 | 207,360 | 385,507 | 1.8591 |
25 Jul 2011 18:56:20 | 1115875 | 13023207 | hadcm3n_t3g9_1940_40_007315160_0 | 181,440 | 337,049 | 1.8576 |
25 Jul 2011 16:00:36 | 1115875 | 13023207 | hadcm3n_t3g9_1940_40_007315160_0 | 155,520 | 288,873 | 1.8575 |
25 Jul 2011 14:57:04 | 1115875 | 13023207 | hadcm3n_t3g9_1940_40_007315160_0 | 129,600 | 240,791 | 1.8580 |
10 Jul 2011 19:23:03 | 1115875 | 13023207 | hadcm3n_t3g9_1940_40_007315160_0 | 103,680 | 193,330 | 1.8647 |
08 Jul 2011 03:31:49 | 1115875 | 13023207 | hadcm3n_t3g9_1940_40_007315160_0 | 77,760 | 145,654 | 1.8731 |
05 Jul 2011 18:55:20 | 1115875 | 13023207 | hadcm3n_t3g9_1940_40_007315160_0 | 51,840 | 96,676 | 1.8649 |
03 Jul 2011 04:14:30 | 1115875 | 13023207 | hadcm3n_t3g9_1940_40_007315160_0 | 25,920 | 48,468 | 1.8699 |
©2025 cpdn.org