Name | hadam3p_saf_0uj3_1974_1_006872743_1 |
Workunit | 7076059 |
Created | 2 Apr 2012, 13:26:46 UTC |
Sent | 2 Apr 2012, 13:26:52 UTC |
Report deadline | 15 Mar 2013, 18:46:52 UTC |
Received | 11 Apr 2012, 19:08:06 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 0 (0x00000000) |
Computer ID | 1206904 |
Run time | 4 days 2 hours 17 min 1 sec |
CPU time | 3 days 2 hours 3 min 11 sec |
Validate state | Invalid |
Credit | 1,122.82 |
Device peak FLOPS | 2.34 GFLOPS |
Application version | UK Met Office HadAM3P-HadRM3P Southern Africa v6.09 windows_intelx86 |
Stderr | <core_client_version>6.12.34</core_client_version> <![CDATA[ <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7272, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7320, selfPID=5408, iMonCtr=1 Model crash detected, will try to restart... 10:05:33 (4848): No heartbeat from core client for 30 sec - exiting 10:05:34 (4848): No heartbeat from core client for 30 sec - exiting 10:05:35 (4848): No heartbeat from core client for 30 sec - exiting 10:05:36 (4848): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=880, selfPID=5744, iMonCtr=1 Model crash detected, will try to restart... 15:27:30 (5052): No heartbeat from core client for 30 sec - exiting 15:27:31 (5052): No heartbeat from core client for 30 sec - exiting 15:27:32 (5052): No heartbeat from core client for 30 sec - exiting 15:27:33 (5052): No heartbeat from core client for 30 sec - exiting 15:27:34 (5052): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:28:20 (6708): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Regional Worker:: Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7084, iMonCtr=2 Model crash detected, will try to restart... 16:41:35 (4664): No heartbeat from core client for 30 sec - exiting 16:41:37 (4664): No heartbeat from core client for 30 sec - exiting 16:41:38 (4664): No heartbeat from core client for 30 sec - exiting 16:41:39 (4664): No heartbeat from core client for 30 sec - exiting 16:41:40 (4664): No heartbeat from core client for 30 sec - exiting 16:41:41 (4664): No heartbeat from core client for 30 sec - exiting 16:41:42 (4664): No heartbeat from core client for 30 sec - exiting 16:41:43 (4664): No heartbeat from core client for 30 sec - exiting 16:41:44 (4664): No heartbeat from core client for 30 sec - exiting 16:41:45 (4664): No heartbeat from core client for 30 sec - exiting 16:41:46 (4664): No heartbeat from core client for 30 sec - exiting 16:41:47 (4664): No heartbeat from core client for 30 sec - exiting 16:41:49 (4664): No heartbeat from core client for 30 sec - exiting 16:41:50 (4664): No heartbeat from core client for 30 sec - exiting 16:41:51 (4664): No heartbeat from core client for 30 sec - exiting 16:41:52 (4664): No heartbeat from core client for 30 sec - exiting 16:41:53 (4664): No heartbeat from core client for 30 sec - exiting 16:41:54 (4664): No heartbeat from core client for 30 sec - exiting 16:41:55 (4664): No heartbeat from core client for 30 sec - exiting 16:41:56 (4664): No heartbeat from core client for 30 sec - exiting 16:41:57 (4664): No heartbeat from core client for 30 sec - exiting 16:41:58 (4664): No heartbeat from core client for 30 sec - exiting 16:41:59 (4664): No heartbeat from core client for 30 sec - exiting 16:42:01 (4664): No heartbeat from core client for 30 sec - exiting 16:42:02 (4664): No heartbeat from core client for 30 sec - exiting 16:42:03 (4664): No heartbeat from core client for 30 sec - exiting 16:42:04 (4664): No heartbeat from core client for 30 sec - exiting 16:42:05 (4664): No heartbeat from core client for 30 sec - exiting 16:42:06 (4664): No heartbeat from core client for 30 sec - exiting 16:42:07 (4664): No heartbeat from core client for 30 sec - exiting 16:42:08 (4664): No heartbeat from core client for 30 sec - exiting 16:42:09 (4664): No heartbeat from core client for 30 sec - exiting 16:42:10 (4664): No heartbeat from core client for 30 sec - exiting 16:42:11 (4664): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6672, iMonCtr=2 Model crash detected, will try to restart... 15:00:48 (1432): No heartbeat from core client for 30 sec - exiting 15:00:49 (1432): No heartbeat from core client for 30 sec - exiting 15:00:51 (1432): No heartbeat from core client for 30 sec - exiting 15:00:52 (1432): No heartbeat from core client for 30 sec - exiting 15:00:53 (1432): No heartbeat from core client for 30 sec - exiting 15:00:54 (1432): No heartbeat from core client for 30 sec - exiting 15:00:55 (1432): No heartbeat from core client for 30 sec - exiting 15:00:56 (1432): No heartbeat from core client for 30 sec - exiting 15:00:57 (1432): No heartbeat from core client for 30 sec - exiting 15:00:58 (1432): No heartbeat from core client for 30 sec - exiting 15:00:59 (1432): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6808, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5856, iMonCtr=2 Model crash detected, will try to restart... 07:54:33 (4008): No heartbeat from core client for 30 sec - exiting 07:54:35 (4008): No heartbeat from core client for 30 sec - exiting 07:54:36 (4008): No heartbeat from core client for 30 sec - exiting 07:54:37 (4008): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7164, selfPID=6276, iMonCtr=1 Model crash detected, will try to restart... 11:14:12 (5240): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:14:14 (5240): No heartbeat from core client for 30 sec - exiting 11:14:15 (5240): No heartbeat from core client for 30 sec - exiting 11:14:16 (5240): No heartbeat from core client for 30 sec - exiting 11:14:17 (5240): No heartbeat from core client for 30 sec - exiting 11:14:18 (5240): No heartbeat from core client for 30 sec - exiting 11:14:19 (5240): No heartbeat from core client for 30 sec - exiting 11:14:20 (5240): No heartbeat from core client for 30 sec - exiting 11:14:21 (5240): No heartbeat from core client for 30 sec - exiting 11:14:23 (5240): No heartbeat from core client for 30 sec - exiting RCM: BUFFIN : Read Failed: No such file or directory RCM : BUFFIN: C I/O Error feof - Unit 62 - Return code = 16 RCM : BUFFIN: C I/O Error feof - Unit 62 - Return code = 16 GCM: BUFFIN : Read Failed: Result too large GCM : BUFFIN: C I/O Error feof - Unit 62 - Return code = 16 GCM : BUFFIN: C I/O Error feof - Unit 62 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/xaakm.pipe_dummy 2048 </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>hadam3p_saf_0uj3_1974_1_006872743_1_7.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_saf_0uj3_1974_1_006872743_1_8.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_saf_0uj3_1974_1_006872743_1_9.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_saf_0uj3_1974_1_006872743_1_10.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_saf_0uj3_1974_1_006872743_1_11.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_saf_0uj3_1974_1_006872743_1_12.zip</file_name> <error_code>-161</error_code> </file_xfer_error> </message> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
10 Apr 2012 16:02:48 | 1206904 | 14346241 | hadam3p_saf_0uj3_1974_1_006872743_1 | 69,216 | 230,398 | 3.3287 |
09 Apr 2012 18:23:06 | 1206904 | 14346241 | hadam3p_saf_0uj3_1974_1_006872743_1 | 57,696 | 192,515 | 3.3367 |
08 Apr 2012 16:43:38 | 1206904 | 14346241 | hadam3p_saf_0uj3_1974_1_006872743_1 | 46,176 | 154,312 | 3.3418 |
05 Apr 2012 19:15:37 | 1206904 | 14346241 | hadam3p_saf_0uj3_1974_1_006872743_1 | 34,656 | 115,782 | 3.3409 |
04 Apr 2012 18:15:28 | 1206904 | 14346241 | hadam3p_saf_0uj3_1974_1_006872743_1 | 23,136 | 77,527 | 3.3509 |
03 Apr 2012 14:30:31 | 1206904 | 14346241 | hadam3p_saf_0uj3_1974_1_006872743_1 | 11,616 | 39,009 | 3.3582 |
©2024 cpdn.org