Name | hadcm3n_yjv0_1940_40_008239239_4 |
Workunit | 8394363 |
Created | 5 Nov 2012, 20:48:59 UTC |
Sent | 5 Nov 2012, 20:49:11 UTC |
Report deadline | 5 Feb 2013, 4:16:22 UTC |
Received | 23 Jan 2013, 14:33:26 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 193 (0x000000C1) EXIT_SIGNAL |
Computer ID | 1236037 |
Run time | 16 days 5 hours 29 min 8 sec |
CPU time | 15 days 18 hours 32 min 16 sec |
Validate state | Invalid |
Credit | 9,331.20 |
Device peak FLOPS | 2.55 GFLOPS |
Application version | UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86 |
Stderr | <core_client_version>7.0.42</core_client_version> <![CDATA[ <message> - exit code 193 (0xc1) </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6080, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6052, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5952, iMonCtr=1 Model crash detected, will try to restart... Atmos Hold Restart file rename failed on atmos_restart.hold Atmos Hold Restart file rename failed on atmos_restart.hold Atmos Hold Restart file rename failed on atmos_restart.hold Atmos Hold Restart file rename failed on atmos_restart.hold Atmos Hold Restart file rename failed on atmos_restart.hold Ocean Restart file copy failed on yjv0ko.dae8470 Ocean Restart file copy failed on yjv0ko.dae8480 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6124, iMonCtr=1 Model crash detected, will try to restart... 09:43:40 (5780): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5908, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1244, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5764, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6072, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5952, iMonCtr=1 Model crash detected, will try to restart... 21:15:12 (5732): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1316, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4556, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5800, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5688, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5776, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6064, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5744, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5708, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... 16:01:04 (5984): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:45:29 (6004): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5644, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5644, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5492, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6024, iMonCtr=1 Model crash detected, will try to restart... Ocean Restart file copy failed on yjv0ko.dag65c0 Ocean Restart file copy failed on yjv0ko.dag65d0 Ocean Restart file copy failed on yjv0ko.dag65e0 Ocean Restart file copy failed on yjv0ko.dag65f0 Ocean Restart file copy failed on yjv0ko.dag65g0 Ocean Restart file copy failed on yjv0ko.dag65h0 Ocean Restart file copy failed on yjv0ko.dag65i0 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5804, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5908, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6028, iMonCtr=1 Model crash detected, will try to restart... 10:02:56 (6072): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 15:15:26 (7108): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6372, iMonCtr=1 Model crash detected, will try to restart... Signal 11 received, exiting... Called boinc_finish </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
22 Jan 2013 14:35:33 | 1236037 | 15427607 | hadcm3n_yjv0_1940_40_008239239_4 | 777,600 | 1,362,730 | 1.7525 |
18 Jan 2013 18:35:14 | 1236037 | 15427607 | hadcm3n_yjv0_1940_40_008239239_4 | 751,680 | 1,315,758 | 1.7504 |
17 Jan 2013 16:45:38 | 1236037 | 15427607 | hadcm3n_yjv0_1940_40_008239239_4 | 725,760 | 1,268,315 | 1.7476 |
15 Jan 2013 20:27:22 | 1236037 | 15427607 | hadcm3n_yjv0_1940_40_008239239_4 | 699,840 | 1,221,156 | 1.7449 |
14 Jan 2013 18:47:37 | 1236037 | 15427607 | hadcm3n_yjv0_1940_40_008239239_4 | 673,920 | 1,175,412 | 1.7441 |
11 Jan 2013 17:16:06 | 1236037 | 15427607 | hadcm3n_yjv0_1940_40_008239239_4 | 648,000 | 1,129,046 | 1.7424 |
04 Jan 2013 22:36:14 | 1236037 | 15427607 | hadcm3n_yjv0_1940_40_008239239_4 | 622,080 | 1,083,109 | 1.7411 |
03 Jan 2013 16:46:09 | 1236037 | 15427607 | hadcm3n_yjv0_1940_40_008239239_4 | 596,160 | 1,038,003 | 1.7411 |
31 Dec 2012 19:51:22 | 1236037 | 15427607 | hadcm3n_yjv0_1940_40_008239239_4 | 570,240 | 992,830 | 1.7411 |
27 Dec 2012 21:46:35 | 1236037 | 15427607 | hadcm3n_yjv0_1940_40_008239239_4 | 544,320 | 947,934 | 1.7415 |
21 Dec 2012 17:57:24 | 1236037 | 15427607 | hadcm3n_yjv0_1940_40_008239239_4 | 518,400 | 902,588 | 1.7411 |
19 Dec 2012 20:21:55 | 1236037 | 15427607 | hadcm3n_yjv0_1940_40_008239239_4 | 492,480 | 857,293 | 1.7408 |
17 Dec 2012 20:23:28 | 1236037 | 15427607 | hadcm3n_yjv0_1940_40_008239239_4 | 466,560 | 812,009 | 1.7404 |
13 Dec 2012 22:07:22 | 1236037 | 15427607 | hadcm3n_yjv0_1940_40_008239239_4 | 440,640 | 766,808 | 1.7402 |
13 Dec 2012 17:46:23 | 1236037 | 15427607 | hadcm3n_yjv0_1940_40_008239239_4 | 414,720 | 721,537 | 1.7398 |
07 Dec 2012 19:51:57 | 1236037 | 15427607 | hadcm3n_yjv0_1940_40_008239239_4 | 388,800 | 676,207 | 1.7392 |
04 Dec 2012 20:34:45 | 1236037 | 15427607 | hadcm3n_yjv0_1940_40_008239239_4 | 362,880 | 631,245 | 1.7395 |
03 Dec 2012 18:35:28 | 1236037 | 15427607 | hadcm3n_yjv0_1940_40_008239239_4 | 336,960 | 585,984 | 1.7390 |
29 Nov 2012 22:14:23 | 1236037 | 15427607 | hadcm3n_yjv0_1940_40_008239239_4 | 311,040 | 540,644 | 1.7382 |
28 Nov 2012 18:56:15 | 1236037 | 15427607 | hadcm3n_yjv0_1940_40_008239239_4 | 285,120 | 495,528 | 1.7380 |
©2024 cpdn.org