Name | hadam3p_eu_wkgj_1978_1_006850667_0 |
Workunit | 7053983 |
Created | 18 Nov 2010, 18:25:05 UTC |
Sent | 19 Mar 2011, 3:36:07 UTC |
Report deadline | 29 Feb 2012, 8:56:07 UTC |
Received | 7 Apr 2011, 0:05:37 UTC |
Server state | Over |
Outcome | Didn't need |
Client state | Compute error |
Exit status | 0 (0x00000000) |
Computer ID | 1044542 |
Run time | 9 days 5 hours 13 min 52 sec |
CPU time | 5 days 23 hours 57 min 40 sec |
Validate state | Invalid |
Credit | 1,591.48 |
Device peak FLOPS | 1.99 GFLOPS |
Application version | UK Met Office HadAM3P-HadRM3P Europe v6.08 windows_intelx86 |
Stderr | <core_client_version>6.10.18</core_client_version> <![CDATA[ <stderr_txt> Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1564, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5360, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4900, iMonCtr=2 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1360, iMonCtr=2 Leaving CPDN_Main::Monitor... 07:47:56 (4804): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4332, iMonCtr=2 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2248, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2396, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5928, selfPID=2944, iMonCtr=1 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5132, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=156, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... CPDN Monitor - Quit request from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2080, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5372, selfPID=5088, iMonCtr=1 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4888, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=172, selfPID=2012, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5892, selfPID=172, iMonCtr=1 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1780, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4880, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5240, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3400, iMonCtr=2 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... GController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7316, iMonCtr=2 Model crash detected, will try to restart... lobal Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5260, iMonCtr=2 Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5524, selfPID=2732, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4088, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5448, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4488, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3408, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5420, selfPID=672, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4592, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7600, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5464, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6064, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5824, selfPID=5328, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... 07:51:39 (5916): No heartbeat from core client for 30 sec - exiting 07:51:43 (5916): No heartbeat from core client for 30 sec - exiting 07:51:44 (5916): No heartbeat from core client for 30 sec - exiting 07:51:45 (5916): No heartbeat from core client for 30 sec - exiting 07:51:46 (5916): No heartbeat from core client for 30 sec - exiting 07:51:47 (5916): No heartbeat from core client for 30 sec - exiting 07:51:48 (5916): No heartbeat from core client for 30 sec - exiting 07:51:49 (5916): No heartbeat from core client for 30 sec - exiting 07:51:50 (5916): No heartbeat from core client for 30 sec - exiting 07:51:51 (5916): No heartbeat from core client for 30 sec - exiting 07:51:52 (5916): No heartbeat from core client for 30 sec - exiting 07:51:53 (5916): No heartbeat from core client for 30 sec - exiting 07:51:54 (5916): No heartbeat from core client for 30 sec - exiting 07:51:55 (5916): No heartbeat from core client for 30 sec - exiting 07:51:56 (5916): No heartbeat from core client for 30 sec - exiting 07:51:57 (5916): No heartbeat from core client for 30 sec - exiting 07:51:58 (5916): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5644, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4188, selfPID=5708, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6072, selfPID=3520, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1300, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4484, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5884, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4280, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4288, selfPID=2112, iMonCtr=1 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4324, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4568, selfPID=4524, iMonCtr=1 Model crash detected, will try to restart... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4800, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4624, selfPID=3336, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2664, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2120, selfPID=2328, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5748, selfPID=5748, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5956, selfPID=1100, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3520, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4832, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2188, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6368, selfPID=6368, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=9696, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4944, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5532, selfPID=5144, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5016, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3416, selfPID=3580, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... 16:01:58 (3300): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5952, iMonCtr=2 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4664, selfPID=868, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... 03:09:41 (868): called boinc_finish </stderr_txt> <message> <file_xfer_error> <file_name>hadam3p_eu_wkgj_1978_1_006850667_0_9.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_wkgj_1978_1_006850667_0_10.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_wkgj_1978_1_006850667_0_11.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_wkgj_1978_1_006850667_0_12.zip</file_name> <error_code>-161</error_code> </file_xfer_error> </message> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
04 Apr 2011 08:07:50 | 1044542 | 12120458 | hadam3p_eu_wkgj_1978_1_006850667_0 | 92,256 | 473,043 | 5.1275 |
02 Apr 2011 15:24:31 | 1044542 | 12120458 | hadam3p_eu_wkgj_1978_1_006850667_0 | 80,736 | 415,928 | 5.1517 |
31 Mar 2011 07:29:38 | 1044542 | 12120458 | hadam3p_eu_wkgj_1978_1_006850667_0 | 69,216 | 355,151 | 5.1311 |
29 Mar 2011 10:29:32 | 1044542 | 12120458 | hadam3p_eu_wkgj_1978_1_006850667_0 | 57,696 | 294,976 | 5.1126 |
27 Mar 2011 23:24:18 | 1044542 | 12120458 | hadam3p_eu_wkgj_1978_1_006850667_0 | 46,177 | 235,635 | 5.1029 |
27 Mar 2011 17:26:17 | 1044542 | 12120458 | hadam3p_eu_wkgj_1978_1_006850667_0 | 46,176 | 234,858 | 5.0861 |
25 Mar 2011 18:03:26 | 1044542 | 12120458 | hadam3p_eu_wkgj_1978_1_006850667_0 | 34,656 | 177,457 | 5.1205 |
23 Mar 2011 15:51:34 | 1044542 | 12120458 | hadam3p_eu_wkgj_1978_1_006850667_0 | 23,136 | 119,463 | 5.1635 |
21 Mar 2011 15:51:03 | 1044542 | 12120458 | hadam3p_eu_wkgj_1978_1_006850667_0 | 11,616 | 60,402 | 5.1999 |
©2024 cpdn.org