Name | hadcm3n_o4oy_1940_40_007266050_2 |
Workunit | 7464290 |
Created | 11 Jun 2011, 13:49:35 UTC |
Sent | 11 Jun 2011, 13:49:42 UTC |
Report deadline | 10 Sep 2011, 21:16:53 UTC |
Received | 18 Sep 2011, 10:28:55 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 193 (0x000000C1) EXIT_SIGNAL |
Computer ID | 1012620 |
Run time | 19 days 1 hours 31 min 16 sec |
CPU time | 13 days 9 hours 1 min 19 sec |
Validate state | Invalid |
Credit | 6,220.80 |
Device peak FLOPS | 2.26 GFLOPS |
Application version | UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86 |
Stderr | <core_client_version>6.12.34</core_client_version> <![CDATA[ <message> - exit code 193 (0xc1) </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2576, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4300, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... 13:16:31 (5916): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 13:23:53 (4512): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 09:58:15 (4352): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CBUFFOUT: C I/O Error - Return code = 32 Model crashed: WRITDUMP: BAD BUFFOUT OF DATA tmp/pipe_dummy 2048 Suspended CPDN Monitor - Suspend request from BOINC... Ocean Restart file copy failed on o4oyko.dae73f0 Suspended CPDN Monitor - Suspend request from BOINC... 20:16:18 (4568): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5984, iMonCtr=1 Model crash detected, will try to restart... 12:03:56 (1900): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 12:03:57 (1900): No heartbeat from core client for 30 sec - exiting 12:03:58 (1900): No heartbeat from core client for 30 sec - exiting 12:03:59 (1900): No heartbeat from core client for 30 sec - exiting 12:04:00 (1900): No heartbeat from core client for 30 sec - exiting 12:04:01 (1900): No heartbeat from core client for 30 sec - exiting 12:04:02 (1900): No heartbeat from core client for 30 sec - exiting 12:04:03 (1900): No heartbeat from core client for 30 sec - exiting 12:04:04 (1900): No heartbeat from core client for 30 sec - exiting 12:04:05 (1900): No heartbeat from core client for 30 sec - exiting 12:04:07 (1900): No heartbeat from core client for 30 sec - exiting Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4516, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6040, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5364, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5880, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2716, iMonCtr=1 Model crash detected, will try to restart... 11:24:52 (4652): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 17:05:02 (5168): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 17:05:03 (5168): No heartbeat from core client for 30 sec - exiting 17:05:04 (5168): No heartbeat from core client for 30 sec - exiting 17:05:05 (5168): No heartbeat from core client for 30 sec - exiting 17:05:06 (5168): No heartbeat from core client for 30 sec - exiting 17:05:07 (5168): No heartbeat from core client for 30 sec - exiting 17:05:08 (5168): No heartbeat from core client for 30 sec - exiting 17:05:09 (5168): No heartbeat from core client for 30 sec - exiting 17:05:10 (5168): No heartbeat from core client for 30 sec - exiting 17:05:11 (5168): No heartbeat from core client for 30 sec - exiting 17:05:12 (5168): No heartbeat from core client for 30 sec - exiting 17:05:13 (5168): No heartbeat from core client for 30 sec - exiting Ocean Restart file copy failed on o4oyko.daf19g0 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4980, iMonCtr=1 Model crash detected, will try to restart... CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2344, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4924, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... BUFFOUT: C I/O Error - Return code = 32 Model crashed: WRITHEAD: I/O error tmp/pipe_dummy 2048 BUFFOUT: C I/O Error - Return code = 32 Model crashed: WRITDUMP: BAD BUFFOUT OF DATA tmp/pipe_dummy 2048 Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1208, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4104, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1340, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4448, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5572, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5412, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5544, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5612, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5088, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5624, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5132, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4352, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5164, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2824, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2824, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Signal 11 received, exiting... Called boinc_finish </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
18 Sep 2011 10:28:35 | 1012620 | 12972236 | hadcm3n_o4oy_1940_40_007266050_2 | 518,400 | 1,155,656 | 2.2293 |
16 Sep 2011 07:14:29 | 1012620 | 12972236 | hadcm3n_o4oy_1940_40_007266050_2 | 492,480 | 1,093,383 | 2.2202 |
10 Sep 2011 08:00:13 | 1012620 | 12972236 | hadcm3n_o4oy_1940_40_007266050_2 | 466,560 | 1,028,324 | 2.2041 |
05 Sep 2011 12:01:44 | 1012620 | 12972236 | hadcm3n_o4oy_1940_40_007266050_2 | 440,640 | 965,067 | 2.1901 |
02 Sep 2011 15:15:03 | 1012620 | 12972236 | hadcm3n_o4oy_1940_40_007266050_2 | 414,720 | 898,846 | 2.1674 |
29 Aug 2011 13:51:18 | 1012620 | 12972236 | hadcm3n_o4oy_1940_40_007266050_2 | 388,800 | 834,760 | 2.1470 |
25 Aug 2011 08:57:10 | 1012620 | 12972236 | hadcm3n_o4oy_1940_40_007266050_2 | 362,880 | 770,704 | 2.1239 |
24 Aug 2011 16:33:12 | 1012620 | 12972236 | hadcm3n_o4oy_1940_40_007266050_2 | 336,960 | 742,768 | 2.2043 |
20 Aug 2011 14:06:15 | 1012620 | 12972236 | hadcm3n_o4oy_1940_40_007266050_2 | 311,040 | 678,726 | 2.1821 |
13 Aug 2011 11:06:21 | 1012620 | 12972236 | hadcm3n_o4oy_1940_40_007266050_2 | 285,120 | 614,939 | 2.1568 |
06 Aug 2011 10:31:09 | 1012620 | 12972236 | hadcm3n_o4oy_1940_40_007266050_2 | 259,200 | 552,502 | 2.1316 |
04 Aug 2011 05:58:58 | 1012620 | 12972236 | hadcm3n_o4oy_1940_40_007266050_2 | 233,280 | 489,179 | 2.0970 |
02 Aug 2011 14:53:56 | 1012620 | 12972236 | hadcm3n_o4oy_1940_40_007266050_2 | 207,360 | 428,052 | 2.0643 |
02 Aug 2011 14:53:56 | 1012620 | 12972236 | hadcm3n_o4oy_1940_40_007266050_2 | 181,440 | 364,373 | 2.0082 |
25 Jul 2011 15:27:19 | 1012620 | 12972236 | hadcm3n_o4oy_1940_40_007266050_2 | 155,520 | 300,880 | 1.9347 |
09 Jul 2011 07:45:02 | 1012620 | 12972236 | hadcm3n_o4oy_1940_40_007266050_2 | 129,600 | 243,181 | 1.8764 |
05 Jul 2011 07:33:24 | 1012620 | 12972236 | hadcm3n_o4oy_1940_40_007266050_2 | 103,680 | 191,627 | 1.8483 |
05 Jul 2011 07:33:24 | 1012620 | 12972236 | hadcm3n_o4oy_1940_40_007266050_2 | 77,760 | 180,202 | 2.3174 |
17 Jun 2011 15:13:03 | 1012620 | 12972236 | hadcm3n_o4oy_1940_40_007266050_2 | 51,840 | 117,836 | 2.2731 |
15 Jun 2011 10:12:10 | 1012620 | 12972236 | hadcm3n_o4oy_1940_40_007266050_2 | 25,920 | 60,161 | 2.3210 |
©2024 cpdn.org