Name | hadcm3n_yl9o_1980_40_007858318_0 |
Workunit | 8013430 |
Created | 5 Apr 2012, 18:19:38 UTC |
Sent | 5 Apr 2012, 18:27:35 UTC |
Report deadline | 6 Jul 2012, 1:54:46 UTC |
Received | 2 Jul 2012, 22:24:02 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 193 (0x000000C1) EXIT_SIGNAL |
Computer ID | 1183189 |
Run time | 16 days 20 hours 46 min 5 sec |
CPU time | 16 days 4 hours 37 min 11 sec |
Validate state | Invalid |
Credit | 9,331.20 |
Device peak FLOPS | 2.31 GFLOPS |
Application version | UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86 |
Stderr | <core_client_version>7.0.25</core_client_version> <![CDATA[ <message> - exit code 193 (0xc1) </message> <stderr_txt> 16:27:01 (19328): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 23:52:39 (5236): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 02:18:20 (18480): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 15:51:50 (5780): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 23:50:41 (7884): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 05:49:32 (15360): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 16:48:33 (20948): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:26:23 (2504): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 03:25:22 (10068): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 09:24:11 (13936): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:23:02 (16976): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 21:21:52 (18264): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 03:20:50 (8680): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 03:05:55 (5816): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3056, iMonCtr=1 Model crash detected, will try to restart... 13:07:08 (5912): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:06:06 (7608): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:50:44 (6016): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 02:22:56 (5896): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:21:53 (14700): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=23468, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... 12:38:25 (844): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:37:18 (8120): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=14264, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... 18:37:21 (1152): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:20:05 (1336): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5504, iMonCtr=1 Model crash detected, will try to restart... 02:03:19 (2740): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 06:02:18 (6748): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4988, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4988, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5188, iMonCtr=1 Model crash detected, will try to restart... 11:44:05 (4288): No heartbeat from core client for 30 sec - exiting 11:44:06 (4288): No heartbeat from core client for 30 sec - exiting 11:44:07 (4288): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1996, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4688, iMonCtr=1 Model crash detected, will try to restart... 18:24:02 (5216): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 00:07:21 (4588): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 00:07:22 (4588): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - Suspend request from BOINC... 15:16:34 (5300): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:15:23 (11008): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5336, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5548, iMonCtr=1 Model crash detected, will try to restart... CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5628, iMonCtr=1 Model crash detected, will try to restart... 02:53:01 (4960): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=9304, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5420, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4536, iMonCtr=1 Model crash detected, will try to restart... 00:42:09 (852): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 06:41:06 (9200): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=17152, iMonCtr=1 Model crash detected, will try to restart... 18:30:50 (4616): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 03:25:26 (3340): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Atmos Hold Restart file rename failed on atmos_restart.hold Atmos Hold Restart file rename failed on atmos_restart.hold Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4904, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
02 Jul 2012 22:26:08 | 1183189 | 14366128 | hadcm3n_yl9o_1980_40_007858318_0 | 777,600 | 1,399,024 | 1.7992 |
02 Jul 2012 19:25:07 | 1183189 | 14366128 | hadcm3n_yl9o_1980_40_007858318_0 | 751,680 | 1,357,492 | 1.8059 |
27 Jun 2012 08:32:47 | 1183189 | 14366128 | hadcm3n_yl9o_1980_40_007858318_0 | 725,760 | 1,311,112 | 1.8065 |
25 Jun 2012 07:43:41 | 1183189 | 14366128 | hadcm3n_yl9o_1980_40_007858318_0 | 699,840 | 1,260,037 | 1.8005 |
23 Jun 2012 22:54:17 | 1183189 | 14366128 | hadcm3n_yl9o_1980_40_007858318_0 | 673,920 | 1,207,870 | 1.7923 |
20 Jun 2012 09:03:55 | 1183189 | 14366128 | hadcm3n_yl9o_1980_40_007858318_0 | 648,000 | 1,159,547 | 1.7894 |
15 Jun 2012 22:09:29 | 1183189 | 14366128 | hadcm3n_yl9o_1980_40_007858318_0 | 622,080 | 1,112,423 | 1.7882 |
04 Jun 2012 23:08:48 | 1183189 | 14366128 | hadcm3n_yl9o_1980_40_007858318_0 | 596,160 | 1,069,152 | 1.7934 |
03 Jun 2012 16:24:41 | 1183189 | 14366128 | hadcm3n_yl9o_1980_40_007858318_0 | 570,240 | 1,027,930 | 1.8026 |
01 Jun 2012 01:17:28 | 1183189 | 14366128 | hadcm3n_yl9o_1980_40_007858318_0 | 544,320 | 984,247 | 1.8082 |
29 May 2012 02:26:14 | 1183189 | 14366128 | hadcm3n_yl9o_1980_40_007858318_0 | 518,400 | 939,449 | 1.8122 |
26 May 2012 00:18:19 | 1183189 | 14366128 | hadcm3n_yl9o_1980_40_007858318_0 | 492,480 | 893,365 | 1.8140 |
17 May 2012 09:41:23 | 1183189 | 14366128 | hadcm3n_yl9o_1980_40_007858318_0 | 466,560 | 846,402 | 1.8141 |
16 May 2012 09:07:41 | 1183189 | 14366128 | hadcm3n_yl9o_1980_40_007858318_0 | 440,640 | 800,383 | 1.8164 |
14 May 2012 00:43:12 | 1183189 | 14366128 | hadcm3n_yl9o_1980_40_007858318_0 | 414,720 | 752,080 | 1.8135 |
12 May 2012 18:38:00 | 1183189 | 14366128 | hadcm3n_yl9o_1980_40_007858318_0 | 388,800 | 704,328 | 1.8115 |
06 May 2012 02:20:10 | 1183189 | 14366128 | hadcm3n_yl9o_1980_40_007858318_0 | 362,880 | 656,459 | 1.8090 |
03 May 2012 00:36:28 | 1183189 | 14366128 | hadcm3n_yl9o_1980_40_007858318_0 | 336,960 | 609,537 | 1.8089 |
02 May 2012 02:48:50 | 1183189 | 14366128 | hadcm3n_yl9o_1980_40_007858318_0 | 311,040 | 561,271 | 1.8045 |
29 Apr 2012 16:27:05 | 1183189 | 14366128 | hadcm3n_yl9o_1980_40_007858318_0 | 285,120 | 514,193 | 1.8034 |
©2024 cpdn.org