Name | hadam3p_pnw_ysbq_1997_1_006882174_1 |
Workunit | 7085490 |
Created | 23 Apr 2012, 11:06:46 UTC |
Sent | 24 Apr 2012, 20:34:03 UTC |
Report deadline | 7 Apr 2013, 1:54:03 UTC |
Received | 2 May 2012, 16:58:44 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 0 (0x00000000) |
Computer ID | 1213995 |
Run time | 1 days 14 hours 20 min 26 sec |
CPU time | 1 days 11 hours 44 min 36 sec |
Validate state | Invalid |
Credit | 753.03 |
Device peak FLOPS | 2.83 GFLOPS |
Application version | UK Met Office HadAM3P-HadRM3P Pacific North West v6.09 windows_intelx86 |
Stderr | <core_client_version>7.0.25</core_client_version> <![CDATA[ <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... 22:25:41 (2864): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3916, selfPID=3916, iMonCtr=2 CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2624, selfPID=4140, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3744, selfPID=3744, iMonCtr=2 CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 20:33:28 (3288): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:36:40 (4296): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:49:28 (3520): Can't acquire lockfile (32) - waiting 35s 20:49:43 (3068): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:51:48 (3224): Can't acquire lockfile (32) - waiting 35s 20:52:00 (3412): Can't acquire lockfile (32) - waiting 35s 20:52:20 (3520): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:52:23 (3224): Can't acquire lockfile (32) - exiting 20:52:23 (3224): Error: The process cannot access the file because it is being used by another process. (0x20) 20:52:35 (3412): Can't acquire lockfile (32) - exiting 20:52:35 (3412): Error: The process cannot access the file because it is being used by another process. (0x20) 20:52:49 (3160): Can't acquire lockfile (32) - waiting 35s 20:53:12 (3480): Can't acquire lockfile (32) - waiting 35s 20:53:21 (2540): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:53:47 (3480): Can't acquire lockfile (32) - exiting 20:53:47 (3480): Error: The process cannot access the file because it is being used by another process. (0x20) 20:54:19 (3160): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4316, selfPID=3040, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Regional yearly means requires 12 input files got 3 Model crashed: READHIST: End of file in READ from history file for namelist NLIHISTO tmp/xaakg.pipe_dummy 2048 Leaving CPDN_Main::Monitor... Regional yearly means requires 12 input files got 3 zip error: Could not create output file (was replacing the original zip file) Called boinc_finish </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>hadam3p_pnw_ysbq_1997_1_006882174_1_4.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_pnw_ysbq_1997_1_006882174_1_5.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_pnw_ysbq_1997_1_006882174_1_6.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_pnw_ysbq_1997_1_006882174_1_7.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_pnw_ysbq_1997_1_006882174_1_8.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_pnw_ysbq_1997_1_006882174_1_9.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_pnw_ysbq_1997_1_006882174_1_10.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_pnw_ysbq_1997_1_006882174_1_11.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_pnw_ysbq_1997_1_006882174_1_12.zip</file_name> <error_code>-161</error_code> </file_xfer_error> </message> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
29 Apr 2012 21:24:55 | 1213995 | 14576210 | hadam3p_pnw_ysbq_1997_1_006882174_1 | 34,656 | 97,952 | 2.8264 |
27 Apr 2012 14:29:13 | 1213995 | 14576210 | hadam3p_pnw_ysbq_1997_1_006882174_1 | 23,136 | 63,674 | 2.7522 |
26 Apr 2012 20:25:19 | 1213995 | 14576210 | hadam3p_pnw_ysbq_1997_1_006882174_1 | 11,616 | 30,184 | 2.5985 |
©2024 cpdn.org