Name | hadcm3n_3bqs_2020_40_008365962_1 |
Workunit | 8516821 |
Created | 18 May 2013, 4:03:15 UTC |
Sent | 18 May 2013, 4:03:36 UTC |
Report deadline | 17 Aug 2013, 11:30:47 UTC |
Received | 24 Jun 2013, 1:15:44 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 22 (0x00000016) Unknown error code |
Computer ID | 1103902 |
Run time | 36 days 19 hours 7 min 56 sec |
CPU time | 33 days 21 hours 14 min 47 sec |
Validate state | Invalid |
Credit | 11,197.44 |
Device peak FLOPS | 1.60 GFLOPS |
Application version | UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86 |
Stderr | <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> 19:50:32 (3308): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:50:34 (3308): No heartbeat from core client for 30 sec - exiting Atmos Hold Restart file rename failed on atmos_restart.hold Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5100, iMonCtr=1 Model crash detected, will try to restart... 00:23:59 (2804): No heartbeat from core client for 30 sec - exiting 00:24:00 (2804): No heartbeat from core client for 30 sec - exiting 00:24:01 (2804): No heartbeat from core client for 30 sec - exiting 00:24:02 (2804): No heartbeat from core client for 30 sec - exiting 00:24:03 (2804): No heartbeat from core client for 30 sec - exiting 00:24:04 (2804): No heartbeat from core client for 30 sec - exiting 00:24:05 (2804): No heartbeat from core client for 30 sec - exiting 00:24:06 (2804): No heartbeat from core client for 30 sec - exiting 00:24:07 (2804): No heartbeat from core client for 30 sec - exiting 00:24:08 (2804): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 00:24:09 (2804): No heartbeat from core client for 30 sec - exiting 00:30:55 (4380): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 00:30:58 (4380): No heartbeat from core client for 30 sec - exiting 00:50:25 (5260): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 00:50:26 (5260): No heartbeat from core client for 30 sec - exiting 00:50:27 (5260): No heartbeat from core client for 30 sec - exiting 00:50:28 (5260): No heartbeat from core client for 30 sec - exiting 00:50:29 (5260): No heartbeat from core client for 30 sec - exiting Atmos Hold Restart file rename failed on atmos_restart.hold 05:35:56 (896): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 05:35:58 (896): No heartbeat from core client for 30 sec - exiting 05:35:59 (896): No heartbeat from core client for 30 sec - exiting 05:38:19 (6064): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 10:58:52 (5836): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 10:58:53 (5836): No heartbeat from core client for 30 sec - exiting 10:58:54 (5836): No heartbeat from core client for 30 sec - exiting 11:05:09 (4712): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 00:12:04 (2312): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 00:12:05 (2312): No heartbeat from core client for 30 sec - exiting 00:12:06 (2312): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... 00:22:17 (996): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Atmos Hold Restart file rename failed on atmos_restart.hold Ocean Restart file copy failed on 3bqsko.dam3b40 20:35:25 (3480): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:35:26 (3480): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - Suspend request from BOINC... 23:13:04 (6084): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:13:06 (6084): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... Ocean Restart file copy failed on 3bqsko.dan03l0 20:25:30 (2836): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:25:31 (2836): No heartbeat from core client for 30 sec - exiting 20:25:32 (2836): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - Suspend request from BOINC... 22:51:08 (2096): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:51:09 (2096): No heartbeat from core client for 30 sec - exiting 22:51:10 (2096): No heartbeat from core client for 30 sec - exiting 18:45:41 (1892): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:45:43 (1892): No heartbeat from core client for 30 sec - exiting Ocean Restart file copy failed on 3bqsko.dan57s0 Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Ocean Restart file copy failed on 3bqsko.dao3ar0 Suspended CPDN Monitor - Suspend request from BOINC... Ocean Restart file copy failed on 3bqsko.dao61c0 Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2924, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... 13:06:16 (4392): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 13:06:18 (4392): No heartbeat from core client for 30 sec - exiting 13:06:19 (4392): No heartbeat from core client for 30 sec - exiting 23:55:46 (1436): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... BUFFIN: C I/O Error feof - Unit 63 - Return code = 16 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 65 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 BUFFIN: C I/O Error feof - Unit 65 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 BUFFIN: C I/O Error feof - Unit 65 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 BUFFIN: C I/O Error feof - Unit 65 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 BUFFIN: C I/O Error feof - Unit 65 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 BUFFIN: C I/O Error feof - Unit 65 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
23 Jun 2013 22:59:59 | 1103902 | 15788802 | hadcm3n_3bqs_2020_40_008365962_1 | 933,120 | 2,927,670 | 3.1375 |
22 Jun 2013 22:32:13 | 1103902 | 15788802 | hadcm3n_3bqs_2020_40_008365962_1 | 907,200 | 2,845,581 | 3.1367 |
21 Jun 2013 20:04:22 | 1103902 | 15788802 | hadcm3n_3bqs_2020_40_008365962_1 | 881,280 | 2,757,515 | 3.1290 |
20 Jun 2013 17:38:05 | 1103902 | 15788802 | hadcm3n_3bqs_2020_40_008365962_1 | 855,360 | 2,669,001 | 3.1203 |
19 Jun 2013 14:59:08 | 1103902 | 15788802 | hadcm3n_3bqs_2020_40_008365962_1 | 829,440 | 2,579,207 | 3.1096 |
18 Jun 2013 12:02:12 | 1103902 | 15788802 | hadcm3n_3bqs_2020_40_008365962_1 | 803,520 | 2,488,598 | 3.0971 |
17 Jun 2013 10:36:43 | 1103902 | 15788802 | hadcm3n_3bqs_2020_40_008365962_1 | 777,600 | 2,404,998 | 3.0928 |
16 Jun 2013 09:36:12 | 1103902 | 15788802 | hadcm3n_3bqs_2020_40_008365962_1 | 751,680 | 2,321,668 | 3.0886 |
15 Jun 2013 09:24:54 | 1103902 | 15788802 | hadcm3n_3bqs_2020_40_008365962_1 | 725,760 | 2,241,050 | 3.0879 |
14 Jun 2013 08:52:33 | 1103902 | 15788802 | hadcm3n_3bqs_2020_40_008365962_1 | 699,840 | 2,159,244 | 3.0853 |
13 Jun 2013 08:08:34 | 1103902 | 15788802 | hadcm3n_3bqs_2020_40_008365962_1 | 673,920 | 2,077,159 | 3.0822 |
12 Jun 2013 06:55:11 | 1103902 | 15788802 | hadcm3n_3bqs_2020_40_008365962_1 | 648,000 | 1,993,975 | 3.0771 |
11 Jun 2013 05:36:58 | 1103902 | 15788802 | hadcm3n_3bqs_2020_40_008365962_1 | 622,080 | 1,909,969 | 3.0703 |
10 Jun 2013 04:20:58 | 1103902 | 15788802 | hadcm3n_3bqs_2020_40_008365962_1 | 596,160 | 1,825,877 | 3.0627 |
09 Jun 2013 02:27:10 | 1103902 | 15788802 | hadcm3n_3bqs_2020_40_008365962_1 | 570,240 | 1,739,730 | 3.0509 |
07 Jun 2013 23:54:27 | 1103902 | 15788802 | hadcm3n_3bqs_2020_40_008365962_1 | 544,320 | 1,651,760 | 3.0345 |
06 Jun 2013 21:50:09 | 1103902 | 15788802 | hadcm3n_3bqs_2020_40_008365962_1 | 518,400 | 1,564,747 | 3.0184 |
05 Jun 2013 20:44:24 | 1103902 | 15788802 | hadcm3n_3bqs_2020_40_008365962_1 | 492,480 | 1,480,973 | 3.0072 |
04 Jun 2013 19:47:49 | 1103902 | 15788802 | hadcm3n_3bqs_2020_40_008365962_1 | 466,560 | 1,397,893 | 2.9962 |
03 Jun 2013 19:19:59 | 1103902 | 15788802 | hadcm3n_3bqs_2020_40_008365962_1 | 440,640 | 1,315,842 | 2.9862 |
©2024 climateprediction.net