Name | hadcm3n_7anx_1980_40_008425104_1 |
Workunit | 8575960 |
Created | 13 Jan 2014, 23:13:22 UTC |
Sent | 13 Jan 2014, 23:13:42 UTC |
Report deadline | 15 Apr 2014, 6:40:53 UTC |
Received | 20 Feb 2014, 12:44:18 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | -226 (0xFFFFFF1E) ERR_TOO_MANY_EXITS |
Computer ID | 1070092 |
Run time | 27 days 14 hours 50 min 42 sec |
CPU time | 19 days 16 hours 32 min 11 sec |
Validate state | Invalid |
Credit | 11,197.44 |
Device peak FLOPS | 2.67 GFLOPS |
Application version | UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86 |
Stderr | <core_client_version>7.2.28</core_client_version> <![CDATA[ <message> too many exit(0)s </message> <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... 13:12:32 (4136): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 13:12:34 (4136): No heartbeat from core client for 30 sec - exiting 13:12:35 (4136): No heartbeat from core client for 30 sec - exiting 13:12:36 (4136): No heartbeat from core client for 30 sec - exiting 13:12:37 (4136): No heartbeat from core client for 30 sec - exiting 13:12:38 (4136): No heartbeat from core client for 30 sec - exiting 13:12:39 (4136): No heartbeat from core client for 30 sec - exiting 13:12:40 (4136): No heartbeat from core client for 30 sec - exiting 13:12:41 (4136): No heartbeat from core client for 30 sec - exiting 13:12:42 (4136): No heartbeat from core client for 30 sec - exiting 13:12:43 (4136): No heartbeat from core client for 30 sec - exiting 13:12:44 (4136): No heartbeat from core client for 30 sec - exiting 13:12:45 (4136): No heartbeat from core client for 30 sec - exiting 13:12:46 (4136): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... 12:05:07 (10344): No heartbeat from core client for 30 sec - exiting 12:05:08 (10344): No heartbeat from core client for 30 sec - exiting 12:05:09 (10344): No heartbeat from core client for 30 sec - exiting 12:05:11 (10344): No heartbeat from core client for 30 sec - exiting 12:05:13 (10344): No heartbeat from core client for 30 sec - exiting 12:05:14 (10344): No heartbeat from core client for 30 sec - exiting 12:05:16 (10344): No heartbeat from core client for 30 sec - exiting 12:05:20 (10344): No heartbeat from core client for 30 sec - exiting 12:05:22 (10344): No heartbeat from core client for 30 sec - exiting 12:05:23 (10344): No heartbeat from core client for 30 sec - exiting 12:05:24 (10344): No heartbeat from core client for 30 sec - exiting 12:05:25 (10344): No heartbeat from core client for 30 sec - exiting 12:05:26 (10344): No heartbeat from core client for 30 sec - exiting 12:05:27 (10344): No heartbeat from core client for 30 sec - exiting 12:05:28 (10344): No heartbeat from core client for 30 sec - exiting 12:05:29 (10344): No heartbeat from core client for 30 sec - exiting 12:05:30 (10344): No heartbeat from core client for 30 sec - exiting BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 Error converting file to netcdf: dataout/7anxko.pjj3c10 12:10:25 (5176): No heartbeat from core client for 30 sec - exiting 12:10:26 (5176): No heartbeat from core client for 30 sec - exiting 12:10:27 (5176): No heartbeat from core client for 30 sec - exiting 12:10:28 (5176): No heartbeat from core client for 30 sec - exiting 12:10:29 (5176): No heartbeat from core client for 30 sec - exiting 12:10:30 (5176): No heartbeat from core client for 30 sec - exiting 12:10:31 (5176): No heartbeat from core client for 30 sec - exiting 12:10:32 (5176): No heartbeat from core client for 30 sec - exiting 12:10:33 (5176): No heartbeat from core client for 30 sec - exiting 12:10:34 (5176): No heartbeat from core client for 30 sec - exiting 12:10:35 (5176): No heartbeat from core client for 30 sec - exiting 12:10:36 (5176): No heartbeat from core client for 30 sec - exiting 12:10:37 (5176): No heartbeat from core client for 30 sec - exiting 12:10:38 (5176): No heartbeat from core client for 30 sec - exiting 12:10:39 (5176): No heartbeat from core client for 30 sec - exiting 12:10:40 (5176): No heartbeat from core client for 30 sec - exiting 12:10:41 (5176): No heartbeat from core client for 30 sec - exiting 12:10:42 (5176): No heartbeat from core client for 30 sec - exiting 12:10:43 (5176): No heartbeat from core client for 30 sec - exiting 12:10:44 (5176): No heartbeat from core client for 30 sec - exiting 12:10:45 (5176): No heartbeat from core client for 30 sec - exiting 12:10:46 (5176): No heartbeat from core client for 30 sec - exiting 12:10:47 (5176): No heartbeat from core client for 30 sec - exiting 12:10:48 (5176): No heartbeat from core client for 30 sec - exiting 12:10:49 (5176): No heartbeat from core client for 30 sec - exiting 12:10:50 (5176): No heartbeat from core client for 30 sec - exiting 12:10:51 (5176): No heartbeat from core client for 30 sec - exiting 12:10:52 (5176): No heartbeat from core client for 30 sec - exiting 12:10:53 (5176): No heartbeat from core client for 30 sec - exiting 12:10:55 (5176): No heartbeat from core client for 30 sec - exiting 12:10:56 (5176): No heartbeat from core client for 30 sec - exiting 12:10:57 (5176): No heartbeat from core client for 30 sec - exiting 12:10:58 (5176): No heartbeat from core client for 30 sec - exiting 12:10:59 (5176): No heartbeat from core client for 30 sec - exiting 12:11:00 (5176): No heartbeat from core client for 30 sec - exiting 12:11:01 (5176): No heartbeat from core client for 30 sec - exiting 12:11:02 (5176): No heartbeat from core client for 30 sec - exiting 12:11:03 (5176): No heartbeat from core client for 30 sec - exiting 12:11:04 (5176): No heartbeat from core client for 30 sec - exiting 12:11:05 (5176): No heartbeat from core client for 30 sec - exiting 12:11:06 (5176): No heartbeat from core client for 30 sec - exiting 12:11:07 (5176): No heartbeat from core client for 30 sec - exiting 12:11:08 (5176): No heartbeat from core client for 30 sec - exiting 12:11:09 (5176): No heartbeat from core client for 30 sec - exiting 12:11:10 (5176): No heartbeat from core client for 30 sec - exiting 12:11:11 (5176): No heartbeat from core client for 30 sec - exiting 12:11:12 (5176): No heartbeat from core client for 30 sec - exiting 12:11:13 (5176): No heartbeat from core client for 30 sec - exiting 12:11:14 (5176): No heartbeat from core client for 30 sec - exiting 12:11:15 (5176): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... BUFFIN: C I/O Error feof - Unit 63 - Return code = 16 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 65 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 Error converting file to netcdf: dataout/7anxko.pjj3c10 Error converting file to netcdf: dataout/7anxko.pij3c10 Error converting file to netcdf: dataout/7anxko.pfj3c10 Error converting file to netcdf: dataout/7anxka.phj3c10 Error converting file to netcdf: dataout/7anxka.pgj3c10 Error converting file to netcdf: dataout/7anxka.pej3c10 Error converting file to netcdf: dataout/7anxka.pdj3c10 09:31:49 (5336): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 09:31:54 (5336): No heartbeat from core client for 30 sec - exiting 09:31:55 (5336): No heartbeat from core client for 30 sec - exiting 09:31:56 (5336): No heartbeat from core client for 30 sec - exiting 09:31:57 (5336): No heartbeat from core client for 30 sec - exiting 09:31:58 (5336): No heartbeat from core client for 30 sec - exiting 09:31:59 (5336): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=9024, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5048, iMonCtr=1 Model crash detected, will try to restart... 12:41:34 (4716): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 12:41:38 (4716): No heartbeat from core client for 30 sec - exiting 12:41:39 (4716): No heartbeat from core client for 30 sec - exiting 12:41:40 (4716): No heartbeat from core client for 30 sec - exiting 12:41:41 (4716): No heartbeat from core client for 30 sec - exiting 12:41:42 (4716): No heartbeat from core client for 30 sec - exiting 12:41:43 (4716): No heartbeat from core client for 30 sec - exiting 12:41:44 (4716): No heartbeat from core client for 30 sec - exiting 12:41:45 (4716): No heartbeat from core client for 30 sec - exiting 12:41:46 (4716): No heartbeat from core client for 30 sec - exiting 12:41:47 (4716): No heartbeat from core client for 30 sec - exiting Atmos Hold Restart file rename failed on atmos_restart.hold Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4404, iMonCtr=1 Model crash detected, will try to restart... 03:33:04 (1048): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 12:25:17 (182856): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
20 Feb 2014 06:26:22 | 1070092 | 16217785 | hadcm3n_7anx_1980_40_008425104_1 | 933,120 | 1,685,704 | 1.8065 |
18 Feb 2014 12:03:23 | 1070092 | 16217785 | hadcm3n_7anx_1980_40_008425104_1 | 907,200 | 1,636,422 | 1.8038 |
18 Feb 2014 12:01:46 | 1070092 | 16217785 | hadcm3n_7anx_1980_40_008425104_1 | 881,280 | 1,587,523 | 1.8014 |
16 Feb 2014 01:57:38 | 1070092 | 16217785 | hadcm3n_7anx_1980_40_008425104_1 | 855,360 | 1,539,977 | 1.8004 |
14 Feb 2014 11:36:22 | 1070092 | 16217785 | hadcm3n_7anx_1980_40_008425104_1 | 829,440 | 1,490,896 | 1.7975 |
13 Feb 2014 17:47:40 | 1070092 | 16217785 | hadcm3n_7anx_1980_40_008425104_1 | 803,520 | 1,442,872 | 1.7957 |
12 Feb 2014 20:00:22 | 1070092 | 16217785 | hadcm3n_7anx_1980_40_008425104_1 | 777,600 | 1,399,181 | 1.7994 |
12 Feb 2014 03:30:43 | 1070092 | 16217785 | hadcm3n_7anx_1980_40_008425104_1 | 751,680 | 1,354,637 | 1.8021 |
11 Feb 2014 13:36:52 | 1070092 | 16217785 | hadcm3n_7anx_1980_40_008425104_1 | 725,760 | 1,312,323 | 1.8082 |
11 Feb 2014 00:19:26 | 1070092 | 16217785 | hadcm3n_7anx_1980_40_008425104_1 | 699,840 | 1,269,942 | 1.8146 |
10 Feb 2014 10:32:30 | 1070092 | 16217785 | hadcm3n_7anx_1980_40_008425104_1 | 673,920 | 1,227,714 | 1.8218 |
09 Feb 2014 18:23:50 | 1070092 | 16217785 | hadcm3n_7anx_1980_40_008425104_1 | 648,000 | 1,180,598 | 1.8219 |
09 Feb 2014 07:05:20 | 1070092 | 16217785 | hadcm3n_7anx_1980_40_008425104_1 | 622,080 | 1,139,299 | 1.8314 |
08 Feb 2014 18:27:20 | 1070092 | 16217785 | hadcm3n_7anx_1980_40_008425104_1 | 596,160 | 1,097,950 | 1.8417 |
08 Feb 2014 02:09:25 | 1070092 | 16217785 | hadcm3n_7anx_1980_40_008425104_1 | 570,240 | 1,050,989 | 1.8431 |
07 Feb 2014 20:58:16 | 1070092 | 16217785 | hadcm3n_7anx_1980_40_008425104_1 | 544,320 | 1,003,910 | 1.8443 |
06 Feb 2014 16:36:06 | 1070092 | 16217785 | hadcm3n_7anx_1980_40_008425104_1 | 518,400 | 957,213 | 1.8465 |
05 Feb 2014 11:16:22 | 1070092 | 16217785 | hadcm3n_7anx_1980_40_008425104_1 | 492,480 | 910,910 | 1.8496 |
04 Feb 2014 18:32:58 | 1070092 | 16217785 | hadcm3n_7anx_1980_40_008425104_1 | 466,560 | 864,398 | 1.8527 |
04 Feb 2014 00:52:37 | 1070092 | 16217785 | hadcm3n_7anx_1980_40_008425104_1 | 440,640 | 817,886 | 1.8561 |
©2024 cpdn.org