Name | hadcm3n_t72m_1940_40_007445152_0 |
Workunit | 7642655 |
Created | 9 Sep 2011, 12:17:22 UTC |
Sent | 18 Sep 2011, 10:00:38 UTC |
Report deadline | 18 Dec 2011, 17:27:49 UTC |
Received | 1 Oct 2011, 7:45:49 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 22 (0x00000016) Unknown error code |
Computer ID | 1122757 |
Run time | 12 days 17 hours 52 min 21 sec |
CPU time | 12 days 14 hours 49 min 39 sec |
Validate state | Invalid |
Credit | 4,043.52 |
Device peak FLOPS | 1.70 GFLOPS |
Application version | UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86 |
Stderr | <core_client_version>6.12.34</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> 16:59:21 (7020): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 17:00:22 (5776): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 17:00:26 (5776): No heartbeat from core client for 30 sec - exiting 17:00:27 (5776): No heartbeat from core client for 30 sec - exiting 17:00:28 (5776): No heartbeat from core client for 30 sec - exiting 17:00:29 (5776): No heartbeat from core client for 30 sec - exiting 17:00:30 (5776): No heartbeat from core client for 30 sec - exiting 17:00:31 (5776): No heartbeat from core client for 30 sec - exiting 17:00:32 (5776): No heartbeat from core client for 30 sec - exiting 17:00:34 (5776): No heartbeat from core client for 30 sec - exiting 17:00:35 (5776): No heartbeat from core client for 30 sec - exiting 17:00:36 (5776): No heartbeat from core client for 30 sec - exiting Atmos Hold Restart file rename failed on atmos_restart.hold 17:09:18 (3096): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 19:00:34 (876): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:00:43 (876): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - Suspend request from BOINC... 21:34:45 (952): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:34:46 (952): No heartbeat from core client for 30 sec - exiting 21:34:47 (952): No heartbeat from core client for 30 sec - exiting 21:34:49 (952): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 02:24:06 (7212): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 02:24:29 (7212): No heartbeat from core client for 30 sec - exiting 02:24:30 (7212): No heartbeat from core client for 30 sec - exiting 02:24:31 (7212): No heartbeat from core client for 30 sec - exiting 02:24:32 (7212): No heartbeat from core client for 30 sec - exiting 02:24:33 (7212): No heartbeat from core client for 30 sec - exiting 02:24:34 (7212): No heartbeat from core client for 30 sec - exiting 02:24:35 (7212): No heartbeat from core client for 30 sec - exiting 02:24:36 (7212): No heartbeat from core client for 30 sec - exiting 02:24:37 (7212): No heartbeat from core client for 30 sec - exiting 02:24:38 (7212): No heartbeat from core client for 30 sec - exiting 02:24:40 (7212): No heartbeat from core client for 30 sec - exiting 02:24:41 (7212): No heartbeat from core client for 30 sec - exiting 02:24:42 (7212): No heartbeat from core client for 30 sec - exiting 02:24:43 (7212): No heartbeat from core client for 30 sec - exiting 02:24:44 (7212): No heartbeat from core client for 30 sec - exiting 02:24:45 (7212): No heartbeat from core client for 30 sec - exiting 02:24:46 (7212): No heartbeat from core client for 30 sec - exiting 02:24:47 (7212): No heartbeat from core client for 30 sec - exiting 02:24:48 (7212): No heartbeat from core client for 30 sec - exiting 02:24:49 (7212): No heartbeat from core client for 30 sec - exiting 02:24:50 (7212): No heartbeat from core client for 30 sec - exiting 02:24:52 (7212): No heartbeat from core client for 30 sec - exiting 02:24:53 (7212): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... 00:38:39 (4948): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 00:38:49 (4948): No heartbeat from core client for 30 sec - exiting Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5256, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5256, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5256, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5256, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5256, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5256, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
30 Sep 2011 10:42:59 | 1122757 | 13353987 | hadcm3n_t72m_1940_40_007445152_0 | 336,960 | 1,015,851 | 3.0148 |
29 Sep 2011 12:08:43 | 1122757 | 13353987 | hadcm3n_t72m_1940_40_007445152_0 | 311,040 | 935,187 | 3.0066 |
28 Sep 2011 12:22:38 | 1122757 | 13353987 | hadcm3n_t72m_1940_40_007445152_0 | 285,120 | 854,081 | 2.9955 |
27 Sep 2011 13:51:42 | 1122757 | 13353987 | hadcm3n_t72m_1940_40_007445152_0 | 259,200 | 774,281 | 2.9872 |
26 Sep 2011 15:34:13 | 1122757 | 13353987 | hadcm3n_t72m_1940_40_007445152_0 | 233,280 | 694,794 | 2.9784 |
25 Sep 2011 17:11:16 | 1122757 | 13353987 | hadcm3n_t72m_1940_40_007445152_0 | 207,360 | 614,793 | 2.9649 |
24 Sep 2011 18:43:43 | 1122757 | 13353987 | hadcm3n_t72m_1940_40_007445152_0 | 181,440 | 535,563 | 2.9517 |
23 Sep 2011 20:42:42 | 1122757 | 13353987 | hadcm3n_t72m_1940_40_007445152_0 | 155,520 | 456,021 | 2.9322 |
22 Sep 2011 22:42:28 | 1122757 | 13353987 | hadcm3n_t72m_1940_40_007445152_0 | 129,600 | 376,326 | 2.9038 |
22 Sep 2011 00:37:30 | 1122757 | 13353987 | hadcm3n_t72m_1940_40_007445152_0 | 103,680 | 300,090 | 2.8944 |
21 Sep 2011 06:40:08 | 1122757 | 13353987 | hadcm3n_t72m_1940_40_007445152_0 | 77,760 | 236,067 | 3.0358 |
20 Sep 2011 07:58:05 | 1122757 | 13353987 | hadcm3n_t72m_1940_40_007445152_0 | 51,840 | 158,529 | 3.0580 |
19 Sep 2011 09:23:20 | 1122757 | 13353987 | hadcm3n_t72m_1940_40_007445152_0 | 25,920 | 78,818 | 3.0408 |
©2024 cpdn.org