Name | hadam3p_pnw_zr1j_1966_1_007013567_1 |
Workunit | 7216883 |
Created | 20 Jan 2011, 0:21:46 UTC |
Sent | 20 Jan 2011, 2:20:32 UTC |
Report deadline | 2 Jan 2012, 7:40:32 UTC |
Received | 26 Jan 2011, 9:34:06 UTC |
Server state | Over |
Outcome | No reply |
Client state | Compute error |
Exit status | 194 (0x000000C2) EXIT_ABORTED_BY_CLIENT |
Computer ID | 1118002 |
Run time | 5 days 7 hours 42 min 8 sec |
CPU time | 3 days 22 hours 20 min 34 sec |
Validate state | Invalid |
Credit | 1,503.98 |
Device peak FLOPS | 1.97 GFLOPS |
Application version | UK Met Office HadAM3P-HadRM3P Pacific North West v6.08 windows_intelx86 |
Stderr | <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> Got ack for job that's till active </message> <stderr_txt> 11:33:43 (4912): No heartbeat from core client for 30 sec - exiting 11:33:44 (4912): No heartbeat from core client for 30 sec - exiting 11:33:45 (4912): No heartbeat from core client for 30 sec - exiting 11:33:47 (4912): No heartbeat from core client for 30 sec - exiting 11:33:48 (4912): No heartbeat from core client for 30 sec - exiting 11:33:49 (4912): No heartbeat from core client for 30 sec - exiting 11:33:50 (4912): No heartbeat from core client for 30 sec - exiting 11:33:51 (4912): No heartbeat from core client for 30 sec - exiting 11:33:52 (4912): No heartbeat from core client for 30 sec - exiting 11:33:53 (4912): No heartbeat from core client for 30 sec - exiting 11:33:54 (4912): No heartbeat from core client for 30 sec - exiting 11:33:55 (4912): No heartbeat from core client for 30 sec - exiting 11:33:56 (4912): No heartbeat from core client for 30 sec - exiting 11:33:57 (4912): No heartbeat from core client for 30 sec - exiting 11:33:59 (4912): No heartbeat from core client for 30 sec - exiting 11:34:00 (4912): No heartbeat from core client for 30 sec - exiting 11:34:01 (4912): No heartbeat from core client for 30 sec - exiting 11:34:02 (4912): No heartbeat from core client for 30 sec - exiting 11:34:03 (4912): No heartbeat from core client for 30 sec - exiting 11:34:04 (4912): No heartbeat from core client for 30 sec - exiting 11:34:05 (4912): No heartbeat from core client for 30 sec - exiting 11:34:06 (4912): No heartbeat from core client for 30 sec - exiting 11:34:07 (4912): No heartbeat from core client for 30 sec - exiting 11:34:08 (4912): No heartbeat from core client for 30 sec - exiting 11:34:10 (4912): No heartbeat from core client for 30 sec - exiting 11:34:11 (4912): No heartbeat from core client for 30 sec - exiting 11:34:12 (4912): No heartbeat from core client for 30 sec - exiting 11:34:13 (4912): No heartbeat from core client for 30 sec - exiting 11:34:14 (4912): No heartbeat from core client for 30 sec - exiting 11:34:15 (4912): No heartbeat from core client for 30 sec - exiting 11:34:16 (4912): No heartbeat from core client for 30 sec - exiting 11:34:17 (4912): No heartbeat from core client for 30 sec - exiting 11:34:18 (4912): No heartbeat from core client for 30 sec - exiting 11:34:19 (4912): No heartbeat from core client for 30 sec - exiting 11:34:20 (4912): No heartbeat from core client for 30 sec - exiting 11:34:22 (4912): No heartbeat from core client for 30 sec - exiting 11:34:23 (4912): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5956, selfPID=4048, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Regional yearly means requires 12 input files got 1 Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7700, selfPID=5764, iMonCtr=1 Model crash detected, will try to restart... Glontroller:: CPDN process is sot running, exiting, bRetVal = 1, checkPID=0, selfPID=5912, iMonCtr=2 300, iMonCtr=2 tected, will try to restart... Leaving CPDN_Main::Monitor... 18:11:55 (5152): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3216, selfPID=5688, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Regional yearly means requires 12 input files got 4 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6668, selfPID=4464, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5904, iMonCtr=2 Mode l crash detected, will try to restart... Leaving CPDN_Main::Monitor... Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8160, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6196, selfPID=4532, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... BUFFOUT: C I/O Error - Return code = 32 Model crashed: WRITDUMP: BAD BUFFOUT OF DATA tmp/xaakm.pipe_dummy 2048 Leaving CPDN_Main::Monitor... Regional yearly means requires 12 input files got 6 zip error: Output file write failure (write error on zip file) 04:33:06 (6088): called boinc_finish </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
25 Jan 2011 18:31:20 | 1118002 | 12506947 | hadam3p_pnw_zr1j_1966_1_007013567_1 | 69,216 | 307,823 | 4.4473 |
24 Jan 2011 22:59:34 | 1118002 | 12506947 | hadam3p_pnw_zr1j_1966_1_007013567_1 | 57,696 | 257,336 | 4.4602 |
24 Jan 2011 02:24:14 | 1118002 | 12506947 | hadam3p_pnw_zr1j_1966_1_007013567_1 | 46,176 | 205,821 | 4.4573 |
23 Jan 2011 11:19:12 | 1118002 | 12506947 | hadam3p_pnw_zr1j_1966_1_007013567_1 | 34,656 | 154,138 | 4.4477 |
22 Jan 2011 10:03:24 | 1118002 | 12506947 | hadam3p_pnw_zr1j_1966_1_007013567_1 | 23,136 | 103,428 | 4.4704 |
21 Jan 2011 14:43:45 | 1118002 | 12506947 | hadam3p_pnw_zr1j_1966_1_007013567_1 | 11,616 | 52,773 | 4.5431 |
©2024 cpdn.org