Name | hadcm3n_n7i8_1920_40_008377982_3 |
Workunit | 8528841 |
Created | 30 Aug 2013, 17:49:04 UTC |
Sent | 30 Aug 2013, 18:29:34 UTC |
Report deadline | 30 Nov 2013, 1:56:45 UTC |
Received | 30 Sep 2013, 16:17:34 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 22 (0x00000016) Unknown error code |
Computer ID | 1169946 |
Run time | 5 days 14 hours 24 min 7 sec |
CPU time | 5 days 2 hours 49 min 11 sec |
Validate state | Invalid |
Credit | 4,043.52 |
Device peak FLOPS | 3.28 GFLOPS |
Application version | UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86 |
Stderr | <core_client_version>6.12.34</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6240, iMonCtr=1 Model crash detected, will try to restart... 22:15:46 (3988): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6096, iMonCtr=1 Model crash detected, will try to restart... 22:15:29 (4696): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:54:45 (6988): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1868, iMonCtr=1 Model crash detected, will try to restart... 10:02:52 (4288): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 13:07:28 (5956): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:43:58 (6272): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 21:38:46 (5148): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:21:43 (5456): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 13:45:57 (5676): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:01:27 (6048): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:02:50 (3448): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2428, iMonCtr=1 Model crash detected, will try to restart... 21:55:05 (5268): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 09:10:10 (6024): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 16:27:46 (5204): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 10:42:49 (5568): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 10:13:14 (5380): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:07:23 (576): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:38:00 (2092): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:43:00 (4132): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:07:20 (5152): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:42:45 (5084): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 06:54:02 (4752): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:30:50 (6936): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 07:57:49 (4188): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:39:47 (5548): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:55:29 (5220): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 17:59:58 (3228): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:57:27 (5792): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5912, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5912, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5912, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4568, iMonCtr=1 Model crash detected, will try to restart... 18:30:27 (6088): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5944, iMonCtr=1 Model crash detected, will try to restart... 22:03:24 (4364): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6840, iMonCtr=1 Model crash detected, will try to restart... 13:52:12 (4012): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:46:05 (3316): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6152, iMonCtr=1 Model crash detected, will try to restart... BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 08:06:16 (5280): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
29 Sep 2013 03:01:50 | 1169946 | 15988184 | hadcm3n_n7i8_1920_40_008377982_3 | 336,960 | 418,941 | 1.2433 |
28 Sep 2013 17:07:34 | 1169946 | 15988184 | hadcm3n_n7i8_1920_40_008377982_3 | 311,040 | 385,640 | 1.2398 |
25 Sep 2013 09:21:53 | 1169946 | 15988184 | hadcm3n_n7i8_1920_40_008377982_3 | 285,120 | 353,799 | 1.2409 |
21 Sep 2013 22:57:21 | 1169946 | 15988184 | hadcm3n_n7i8_1920_40_008377982_3 | 259,200 | 321,968 | 1.2422 |
19 Sep 2013 10:52:50 | 1169946 | 15988184 | hadcm3n_n7i8_1920_40_008377982_3 | 233,280 | 289,723 | 1.2420 |
17 Sep 2013 17:27:55 | 1169946 | 15988184 | hadcm3n_n7i8_1920_40_008377982_3 | 207,360 | 257,200 | 1.2404 |
15 Sep 2013 02:42:22 | 1169946 | 15988184 | hadcm3n_n7i8_1920_40_008377982_3 | 181,440 | 225,093 | 1.2406 |
14 Sep 2013 02:50:37 | 1169946 | 15988184 | hadcm3n_n7i8_1920_40_008377982_3 | 155,520 | 192,264 | 1.2363 |
12 Sep 2013 03:42:25 | 1169946 | 15988184 | hadcm3n_n7i8_1920_40_008377982_3 | 129,600 | 160,567 | 1.2389 |
11 Sep 2013 17:03:19 | 1169946 | 15988184 | hadcm3n_n7i8_1920_40_008377982_3 | 103,680 | 128,275 | 1.2372 |
09 Sep 2013 02:41:42 | 1169946 | 15988184 | hadcm3n_n7i8_1920_40_008377982_3 | 77,760 | 97,056 | 1.2481 |
08 Sep 2013 16:24:14 | 1169946 | 15988184 | hadcm3n_n7i8_1920_40_008377982_3 | 51,840 | 65,200 | 1.2577 |
04 Sep 2013 04:29:48 | 1169946 | 15988184 | hadcm3n_n7i8_1920_40_008377982_3 | 25,920 | 33,860 | 1.3063 |
©2024 cpdn.org