Name | hadcm3n_t1g3_1940_40_007311209_2 |
Workunit | 7508639 |
Created | 4 Jul 2011, 11:06:52 UTC |
Sent | 4 Jul 2011, 12:05:40 UTC |
Report deadline | 3 Oct 2011, 19:32:51 UTC |
Received | 29 Aug 2011, 9:46:19 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 22 (0x00000016) Unknown error code |
Computer ID | 1404986 |
Run time | 38 days 9 hours 5 min 38 sec |
CPU time | 26 days 9 hours 33 min 41 sec |
Validate state | Invalid |
Credit | 4,354.56 |
Device peak FLOPS | 1.33 GFLOPS |
Application version | UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86 |
Stderr | <core_client_version>6.6.20</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... 09:44:42 (1488): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 09:44:43 (1488): No heartbeat from core client for 30 sec - exiting 09:44:44 (1488): No heartbeat from core client for 30 sec - exiting 12:09:58 (2404): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 12:09:59 (2404): No heartbeat from core client for 30 sec - exiting 12:10:00 (2404): No heartbeat from core client for 30 sec - exiting 12:45:01 (3340): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 12:45:02 (3340): No heartbeat from core client for 30 sec - exiting 19:39:18 (1692): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:39:20 (1692): No heartbeat from core client for 30 sec - exiting 19:39:21 (1692): No heartbeat from core client for 30 sec - exiting 20:14:15 (4020): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:14:16 (4020): No heartbeat from core client for 30 sec - exiting 20:14:17 (4020): No heartbeat from core client for 30 sec - exiting 20:14:18 (4020): No heartbeat from core client for 30 sec - exiting 20:14:19 (4020): No heartbeat from core client for 30 sec - exiting 20:14:20 (4020): No heartbeat from core client for 30 sec - exiting 20:14:21 (4020): No heartbeat from core client for 30 sec - exiting 21:00:47 (2552): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 21:00:48 (2552): No heartbeat from core client for 30 sec - exiting 22:05:53 (1084): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:05:55 (1084): No heartbeat from core client for 30 sec - exiting 22:05:56 (1084): No heartbeat from core client for 30 sec - exiting 06:17:53 (1820): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 06:17:55 (1820): No heartbeat from core client for 30 sec - exiting 06:32:55 (2184): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 06:32:57 (2184): No heartbeat from core client for 30 sec - exiting 06:32:58 (2184): No heartbeat from core client for 30 sec - exiting 06:32:59 (2184): No heartbeat from core client for 30 sec - exiting 07:46:55 (1464): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 07:46:56 (1464): No heartbeat from core client for 30 sec - exiting 11:42:23 (520): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 12:48:25 (1016): No heartbeat from core client for 30 sec - exiting 12:48:27 (1016): No heartbeat from core client for 30 sec - exiting 12:48:28 (1016): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... BUFFIN: C I/O Error feof - Unit 63 - Return code = 16 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 65 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 Error converting file to netcdf: dataout/t1g3ko.pje1c10 Error converting file to netcdf: dataout/t1g3ko.pie1c10 Error converting file to netcdf: dataout/t1g3ko.pfe1c10 Error converting file to netcdf: dataout/t1g3ka.phe1c10 Error converting file to netcdf: dataout/t1g3ka.pge1c10 Error converting file to netcdf: dataout/t1g3ka.pee1c10 Error converting file to netcdf: dataout/t1g3ka.pde1c10 13:28:28 (3204): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 13:28:29 (3204): No heartbeat from core client for 30 sec - exiting 13:28:31 (3204): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - Suspend request from BOINC... 16:52:44 (2440): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:13:43 (2640): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:58:38 (2340): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 14:58:40 (2340): No heartbeat from core client for 30 sec - exiting 14:58:41 (2340): No heartbeat from core client for 30 sec - exiting 23:00:45 (1768): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 00:57:19 (3652): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 15:36:56 (3992): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:36:58 (3992): No heartbeat from core client for 30 sec - exiting 15:36:59 (3992): No heartbeat from core client for 30 sec - exiting 15:37:00 (3992): No heartbeat from core client for 30 sec - exiting 15:37:01 (3992): No heartbeat from core client for 30 sec - exiting 15:37:02 (3992): No heartbeat from core client for 30 sec - exiting 15:37:03 (3992): No heartbeat from core client for 30 sec - exiting 15:37:04 (3992): No heartbeat from core client for 30 sec - exiting 15:37:05 (3992): No heartbeat from core client for 30 sec - exiting 15:37:06 (3992): No heartbeat from core client for 30 sec - exiting 15:37:07 (3992): No heartbeat from core client for 30 sec - exiting 15:37:08 (3992): No heartbeat from core client for 30 sec - exiting 15:37:09 (3992): No heartbeat from core client for 30 sec - exiting 15:37:10 (3992): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - Suspend request from BOINC... 18:02:31 (3588): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:02:33 (3588): No heartbeat from core client for 30 sec - exiting 18:02:34 (3588): No heartbeat from core client for 30 sec - exiting 18:02:35 (3588): No heartbeat from core client for 30 sec - exiting 18:02:36 (3588): No heartbeat from core client for 30 sec - exiting 18:02:37 (3588): No heartbeat from core client for 30 sec - exiting 18:02:38 (3588): No heartbeat from core client for 30 sec - exiting 18:02:39 (3588): No heartbeat from core client for 30 sec - exiting 18:02:40 (3588): No heartbeat from core client for 30 sec - exiting 18:16:40 (2900): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:16:42 (2900): No heartbeat from core client for 30 sec - exiting 18:26:47 (2056): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:26:49 (2056): No heartbeat from core client for 30 sec - exiting 18:26:50 (2056): No heartbeat from core client for 30 sec - exiting 18:26:51 (2056): No heartbeat from core client for 30 sec - exiting 18:26:52 (2056): No heartbeat from core client for 30 sec - exiting Suspended CPDN Monitor - Suspend request from BOINC... 01:11:05 (1168): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 01:11:07 (1168): No heartbeat from core client for 30 sec - exiting 01:11:08 (1168): No heartbeat from core client for 30 sec - exiting 01:39:25 (1792): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 01:39:27 (1792): No heartbeat from core client for 30 sec - exiting 01:39:28 (1792): No heartbeat from core client for 30 sec - exiting 01:39:29 (1792): No heartbeat from core client for 30 sec - exiting 06:11:22 (1052): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 06:11:25 (1052): No heartbeat from core client for 30 sec - exiting 06:11:26 (1052): No heartbeat from core client for 30 sec - exiting 06:11:27 (1052): No heartbeat from core client for 30 sec - exiting 06:11:28 (1052): No heartbeat from core client for 30 sec - exiting 08:07:02 (3236): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:07:05 (3236): No heartbeat from core client for 30 sec - exiting 08:07:06 (3236): No heartbeat from core client for 30 sec - exiting 08:07:07 (3236): No heartbeat from core client for 30 sec - exiting 08:07:08 (3236): No heartbeat from core client for 30 sec - exiting 08:07:09 (3236): No heartbeat from core client for 30 sec - exiting 08:07:10 (3236): No heartbeat from core client for 30 sec - exiting 13:07:00 (2740): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 13:07:02 (2740): No heartbeat from core client for 30 sec - exiting 13:07:03 (2740): No heartbeat from core client for 30 sec - exiting 13:07:04 (2740): No heartbeat from core client for 30 sec - exiting 13:07:05 (2740): No heartbeat from core client for 30 sec - exiting 13:07:06 (2740): No heartbeat from core client for 30 sec - exiting 13:07:07 (2740): No heartbeat from core client for 30 sec - exiting 13:07:08 (2740): No heartbeat from core client for 30 sec - exiting Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4020, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4020, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4020, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4020, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4020, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4020, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
20 Aug 2011 02:48:34 | 1008295 | 13060187 | hadcm3n_t1g3_1940_40_007311209_2 | 362,880 | 2,205,474 | 6.0777 |
17 Aug 2011 11:53:12 | 1008295 | 13060187 | hadcm3n_t1g3_1940_40_007311209_2 | 336,960 | 2,038,797 | 6.0506 |
14 Aug 2011 23:26:49 | 1008295 | 13060187 | hadcm3n_t1g3_1940_40_007311209_2 | 311,040 | 1,868,102 | 6.0060 |
12 Aug 2011 10:13:10 | 1008295 | 13060187 | hadcm3n_t1g3_1940_40_007311209_2 | 285,120 | 1,704,263 | 5.9774 |
09 Aug 2011 05:02:34 | 1008295 | 13060187 | hadcm3n_t1g3_1940_40_007311209_2 | 259,200 | 1,529,980 | 5.9027 |
06 Aug 2011 14:05:40 | 1008295 | 13060187 | hadcm3n_t1g3_1940_40_007311209_2 | 233,280 | 1,391,663 | 5.9656 |
03 Aug 2011 16:06:11 | 1008295 | 13060187 | hadcm3n_t1g3_1940_40_007311209_2 | 207,360 | 1,217,017 | 5.8691 |
01 Aug 2011 05:45:53 | 1008295 | 13060187 | hadcm3n_t1g3_1940_40_007311209_2 | 181,440 | 1,081,242 | 5.9592 |
27 Jul 2011 21:25:41 | 1008295 | 13060187 | hadcm3n_t1g3_1940_40_007311209_2 | 155,520 | 919,045 | 5.9095 |
26 Jul 2011 00:03:38 | 1008295 | 13060187 | hadcm3n_t1g3_1940_40_007311209_2 | 129,600 | 781,295 | 6.0285 |
25 Jul 2011 21:59:07 | 1008295 | 13060187 | hadcm3n_t1g3_1940_40_007311209_2 | 103,680 | 634,331 | 6.1182 |
25 Jul 2011 17:24:06 | 1008295 | 13060187 | hadcm3n_t1g3_1940_40_007311209_2 | 77,760 | 486,328 | 6.2542 |
25 Jul 2011 16:03:25 | 1008295 | 13060187 | hadcm3n_t1g3_1940_40_007311209_2 | 51,840 | 331,915 | 6.4027 |
09 Jul 2011 05:14:03 | 1008295 | 13060187 | hadcm3n_t1g3_1940_40_007311209_2 | 25,920 | 192,417 | 7.4235 |
©2024 cpdn.org