Name | hadcm3n_yhsf_1900_40_007515946_3 |
Workunit | 7713421 |
Created | 23 Nov 2011, 14:42:54 UTC |
Sent | 23 Nov 2011, 14:48:17 UTC |
Report deadline | 22 Feb 2012, 22:15:28 UTC |
Received | 26 Jan 2012, 19:15:19 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 193 (0x000000C1) EXIT_SIGNAL |
Computer ID | 496610 |
Run time | 61 days 4 hours 56 min 31 sec |
CPU time | 19 days 17 hours 37 min 6 sec |
Validate state | Invalid |
Credit | 9,331.20 |
Device peak FLOPS | 1.36 GFLOPS |
Application version | UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86 |
Stderr | <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> - exit code 193 (0xc1) </message> <stderr_txt> CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... 15:10:51 (2532): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:10:52 (2532): No heartbeat from core client for 30 sec - exiting 15:10:53 (2532): No heartbeat from core client for 30 sec - exiting 15:10:54 (2532): No heartbeat from core client for 30 sec - exiting 15:10:55 (2532): No heartbeat from core client for 30 sec - exiting 15:10:56 (2532): No heartbeat from core client for 30 sec - exiting 15:10:57 (2532): No heartbeat from core client for 30 sec - exiting 15:10:58 (2532): No heartbeat from core client for 30 sec - exiting 15:11:00 (2532): No heartbeat from core client for 30 sec - exiting forrtl: Der Prozess kann nicht auf die Datei zugreifen, da ein anderer Prozess einen Teil der Datei gesperrt hat. Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1456, iMonCtr=1 Model crash detected, will try to restart... forrtl: Der Prozess kann nicht auf die Datei zugreifen, da ein anderer Prozess einen Teil der Datei gesperrt hat. Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1456, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... forrtl: Der Prozess kann nicht auf die Datei zugreifen, da ein anderer Prozess einen Teil der Datei gesperrt hat. Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5400, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... 19:46:19 (5904): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 19:46:21 (5904): No heartbeat from core client for 30 sec - exiting 19:46:22 (5904): No heartbeat from core client for 30 sec - exiting 19:46:23 (5904): No heartbeat from core client for 30 sec - exiting 19:46:24 (5904): No heartbeat from core client for 30 sec - exiting 19:46:25 (5904): No heartbeat from core client for 30 sec - exiting 19:46:26 (5904): No heartbeat from core client for 30 sec - exiting 19:46:27 (5904): No heartbeat from core client for 30 sec - exiting 19:46:28 (5904): No heartbeat from core client for 30 sec - exiting 19:46:29 (5904): No heartbeat from core client for 30 sec - exiting 19:46:30 (5904): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... BUFFOUT: C I/O Error - Return code = 32 Model crashed: WRITDUMP: BAD BUFFOUT OF DATA tmp/pipe_dummy 2048 Suspended CPDN Monitor - Suspend request from BOINC... forrtl: Der Prozess kann nicht auf die Datei zugreifen, da ein anderer Prozess einen Teil der Datei gesperrt hat. Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5180, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... forrtl: Der Prozess kann nicht auf die Datei zugreifen, da ein anderer Prozess einen Teil der Datei gesperrt hat. Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2176, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... forrtl: Der Prozess kann nicht auf die Datei zugreifen, da ein anderer Prozess einen Teil der Datei gesperrt hat. Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3832, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... forrtl: Der Prozess kann nicht auf die Datei zugreifen, da ein anderer Prozess einen Teil der Datei gesperrt hat. Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3508, iMonCtr=1 Model crash detected, will try to restart... BUFFOUT: C I/O Error - Return code = 32 Model crashed: WRITDUMP: BAD BUFFOUT OF DATA tmp/pipe_dummy 2048 CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... forrtl: Der Prozess kann nicht auf die Datei zugreifen, da ein anderer Prozess einen Teil der Datei gesperrt hat. Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3520, iMonCtr=1 Model crash detected, will try to restart... No Process Handle Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=1748, selfPID=1748, iMonCtr=1 forrtl: Der Prozess kann nicht auf die Datei zugreifen, da ein anderer Prozess einen Teil der Datei gesperrt hat. CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... forrtl: Der Prozess kann nicht auf die Datei zugreifen, da ein anderer Prozess einen Teil der Datei gesperrt hat. Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3392, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... BUFFOUT: C I/O Error - Return code = 32 Model crashed: WRITDUMP: BAD BUFFOUT OF DATA tmp/pipe_dummy 2048 CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... forrtl: Der Prozess kann nicht auf die Datei zugreifen, da ein anderer Prozess einen Teil der Datei gesperrt hat. Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1512, iMonCtr=1 Model crash detected, will try to restart... Ocean Restart file copy failed on yhsfko.dac82q0 22:25:57 (1512): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 22:25:58 (1512): No heartbeat from core client for 30 sec - exiting 22:25:59 (1512): No heartbeat from core client for 30 sec - exiting 22:26:00 (1512): No heartbeat from core client for 30 sec - exiting 22:26:01 (1512): No heartbeat from core client for 30 sec - exiting 22:26:02 (1512): No heartbeat from core client for 30 sec - exiting 22:26:03 (1512): No heartbeat from core client for 30 sec - exiting 22:26:04 (1512): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... BUFFOUT: C I/O Error - Return code = 32 Model crashed: WRITDUMP: BAD BUFFOUT OF DATA tmp/pipe_dummy 2048 Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Ocean Restart file copy failed on yhsfko.dad0450 CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Signal 11 received, exiting... Called boinc_finish </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
26 Jan 2012 19:17:45 | 496610 | 13656390 | hadcm3n_yhsf_1900_40_007515946_3 | 777,600 | 1,704,991 | 2.1926 |
25 Jan 2012 14:32:01 | 496610 | 13656390 | hadcm3n_yhsf_1900_40_007515946_3 | 751,680 | 1,623,732 | 2.1601 |
24 Jan 2012 03:02:55 | 496610 | 13656390 | hadcm3n_yhsf_1900_40_007515946_3 | 725,760 | 1,512,295 | 2.0837 |
22 Jan 2012 13:17:05 | 496610 | 13656390 | hadcm3n_yhsf_1900_40_007515946_3 | 699,840 | 1,424,041 | 2.0348 |
21 Jan 2012 05:04:45 | 496610 | 13656390 | hadcm3n_yhsf_1900_40_007515946_3 | 673,920 | 1,313,610 | 1.9492 |
19 Jan 2012 11:42:23 | 496610 | 13656390 | hadcm3n_yhsf_1900_40_007515946_3 | 648,000 | 1,358,503 | 2.0965 |
18 Jan 2012 03:12:50 | 496610 | 13656390 | hadcm3n_yhsf_1900_40_007515946_3 | 622,080 | 1,248,205 | 2.0065 |
16 Jan 2012 16:52:00 | 496610 | 13656390 | hadcm3n_yhsf_1900_40_007515946_3 | 596,160 | 1,174,669 | 1.9704 |
14 Jan 2012 07:41:56 | 496610 | 13656390 | hadcm3n_yhsf_1900_40_007515946_3 | 570,240 | 1,263,766 | 2.2162 |
12 Jan 2012 22:10:15 | 496610 | 13656390 | hadcm3n_yhsf_1900_40_007515946_3 | 544,320 | 1,155,170 | 2.1222 |
11 Jan 2012 12:04:53 | 496610 | 13656390 | hadcm3n_yhsf_1900_40_007515946_3 | 518,400 | 1,050,329 | 2.0261 |
09 Jan 2012 18:00:54 | 496610 | 13656390 | hadcm3n_yhsf_1900_40_007515946_3 | 492,480 | 1,070,231 | 2.1731 |
08 Jan 2012 06:19:24 | 496610 | 13656390 | hadcm3n_yhsf_1900_40_007515946_3 | 466,560 | 1,119,499 | 2.3995 |
06 Jan 2012 19:03:12 | 496610 | 13656390 | hadcm3n_yhsf_1900_40_007515946_3 | 440,640 | 1,010,052 | 2.2922 |
04 Jan 2012 19:28:22 | 496610 | 13656390 | hadcm3n_yhsf_1900_40_007515946_3 | 414,720 | 973,513 | 2.3474 |
03 Jan 2012 08:51:17 | 496610 | 13656390 | hadcm3n_yhsf_1900_40_007515946_3 | 388,800 | 863,042 | 2.2198 |
18 Dec 2011 00:20:51 | 496610 | 13656390 | hadcm3n_yhsf_1900_40_007515946_3 | 362,880 | 1,001,270 | 2.7592 |
16 Dec 2011 14:08:15 | 496610 | 13656390 | hadcm3n_yhsf_1900_40_007515946_3 | 336,960 | 890,738 | 2.6435 |
15 Dec 2011 04:38:15 | 496610 | 13656390 | hadcm3n_yhsf_1900_40_007515946_3 | 311,040 | 780,582 | 2.5096 |
13 Dec 2011 08:00:34 | 496610 | 13656390 | hadcm3n_yhsf_1900_40_007515946_3 | 285,120 | 1,030,823 | 3.6154 |
©2024 cpdn.org