Name | hadcm3n_3kjv_1940_40_008259836_2 |
Workunit | 8414960 |
Created | 27 Feb 2013, 17:19:31 UTC |
Sent | 27 Feb 2013, 17:19:35 UTC |
Report deadline | 30 May 2013, 0:46:46 UTC |
Received | 30 Nov 2013, 10:04:36 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 22 (0x00000016) Unknown error code |
Computer ID | 1166383 |
Run time | 128 days 13 hours 22 min 45 sec |
CPU time | 117 days 2 hours 27 min 10 sec |
Validate state | Invalid |
Credit | 11,508.48 |
Device peak FLOPS | 0.76 GFLOPS |
Application version | UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86 |
Stderr | <core_client_version>7.0.64</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5348, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5732, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5272, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4896, iMonCtr=1 Model crash detected, will try to restart... 04:57:23 (5972): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 23:29:22 (6964): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 05:31:48 (2368): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 02:32:46 (4552): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6888, iMonCtr=1 Model crash detected, will try to restart... 04:00:33 (5292): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:32:04 (6236): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5924, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4220, iMonCtr=1 Model crash detected, will try to restart... Ocean Restart file copy failed on 3kjvko.dag01l0 00:14:23 (3288): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5328, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... 21:52:01 (4512): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:30:02 (2352): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 17:30:56 (264): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:55:46 (6364): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:25:41 (7928): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 00:02:38 (3748): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 00:40:30 (5940): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5892, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... 10:44:47 (6052): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6772, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 02:48:18 (5824): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Model crashed: ATM_DYN : INVALID THETA DETECTED. tmp/pipe_dummy 2048 Model crashed: ATM_DYN : INVALID THETA DETECTED. tmp/pipe_dummy 2048 Model crashed: ATM_DYN : INVALID THETA DETECTED. tmp/pipe_dummy 2048 Model crashed: ATM_DYN : INVALID THETA DETECTED. tmp/pipe_dummy 2048 Model crashed: ATM_DYN : INVALID THETA DETECTED. tmp/pipe_dummy 2048 Model crashed: ATM_DYN : INVALID THETA DETECTED. tmp/pipe_dummy 2048 Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
29 Nov 2013 21:01:55 | 1166383 | 15642713 | hadcm3n_3kjv_1940_40_008259836_2 | 959,040 | 10,123,167 | 10.5555 |
04 Jul 2013 14:02:08 | 1166383 | 15642713 | hadcm3n_3kjv_1940_40_008259836_2 | 933,120 | 9,829,777 | 10.5343 |
02 Jul 2013 10:16:07 | 1166383 | 15642713 | hadcm3n_3kjv_1940_40_008259836_2 | 907,200 | 9,506,391 | 10.4788 |
25 Jun 2013 11:27:23 | 1166383 | 15642713 | hadcm3n_3kjv_1940_40_008259836_2 | 881,280 | 9,177,113 | 10.4134 |
21 Jun 2013 07:32:49 | 1166383 | 15642713 | hadcm3n_3kjv_1940_40_008259836_2 | 855,360 | 8,848,361 | 10.3446 |
17 Jun 2013 02:38:27 | 1166383 | 15642713 | hadcm3n_3kjv_1940_40_008259836_2 | 829,440 | 8,521,354 | 10.2736 |
12 Jun 2013 13:55:59 | 1166383 | 15642713 | hadcm3n_3kjv_1940_40_008259836_2 | 803,520 | 8,194,872 | 10.1987 |
08 Jun 2013 10:54:32 | 1166383 | 15642713 | hadcm3n_3kjv_1940_40_008259836_2 | 777,600 | 7,868,335 | 10.1187 |
04 Jun 2013 07:24:14 | 1166383 | 15642713 | hadcm3n_3kjv_1940_40_008259836_2 | 751,680 | 7,543,779 | 10.0359 |
31 May 2013 05:24:52 | 1166383 | 15642713 | hadcm3n_3kjv_1940_40_008259836_2 | 725,760 | 7,219,939 | 9.9481 |
27 May 2013 02:06:38 | 1166383 | 15642713 | hadcm3n_3kjv_1940_40_008259836_2 | 699,840 | 6,894,529 | 9.8516 |
22 May 2013 15:25:48 | 1166383 | 15642713 | hadcm3n_3kjv_1940_40_008259836_2 | 673,920 | 6,571,531 | 9.7512 |
18 May 2013 14:30:52 | 1166383 | 15642713 | hadcm3n_3kjv_1940_40_008259836_2 | 648,000 | 6,251,590 | 9.6475 |
14 May 2013 09:08:55 | 1166383 | 15642713 | hadcm3n_3kjv_1940_40_008259836_2 | 622,080 | 5,932,282 | 9.5362 |
10 May 2013 05:00:05 | 1166383 | 15642713 | hadcm3n_3kjv_1940_40_008259836_2 | 596,160 | 5,626,375 | 9.4377 |
06 May 2013 05:57:49 | 1166383 | 15642713 | hadcm3n_3kjv_1940_40_008259836_2 | 570,240 | 5,316,917 | 9.3240 |
02 May 2013 08:50:54 | 1166383 | 15642713 | hadcm3n_3kjv_1940_40_008259836_2 | 544,320 | 5,005,381 | 9.1957 |
28 Apr 2013 07:50:51 | 1166383 | 15642713 | hadcm3n_3kjv_1940_40_008259836_2 | 518,400 | 4,711,492 | 9.0885 |
24 Apr 2013 19:04:49 | 1166383 | 15642713 | hadcm3n_3kjv_1940_40_008259836_2 | 492,480 | 4,426,365 | 8.9879 |
20 Apr 2013 23:44:21 | 1166383 | 15642713 | hadcm3n_3kjv_1940_40_008259836_2 | 466,560 | 4,148,183 | 8.8910 |
©2024 cpdn.org