Name | famous_unv5_799_200_006663772_3 |
Workunit | 6867144 |
Created | 10 Jun 2010, 15:31:03 UTC |
Sent | 3 Jul 2010, 18:50:41 UTC |
Report deadline | 3 Oct 2010, 2:17:52 UTC |
Received | 16 Jul 2010, 17:33:49 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 22 (0x00000016) Unknown error code |
Computer ID | 1085414 |
Run time | 8 days 7 hours 8 min 37 sec |
CPU time | 7 days 18 hours 9 min 11 sec |
Validate state | Invalid |
Credit | 1,667.69 |
Device peak FLOPS | 0.74 GFLOPS |
Application version | UK Met Office FAMOUS v6.11 i686-pc-linux-gnu |
Stderr | <core_client_version>6.10.17</core_client_version> <![CDATA[ <message> process exited with code 22 (0x16, -234) </message> <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1404, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Signal 3 received, exiting... (716): called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=20174, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=20174, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... (1274): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... (1291): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... (1311): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... (1329): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... (1349): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... (1369): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... (1385): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... (1385): No heartbeat from core client for 30 sec - exiting Signal 3 received, exiting... (1411): called boinc_finish Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Signal 3 received, exiting... (1451): called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7956, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7956, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7996, selfPID=7996, iMonCtr=1 Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=9204, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=9204, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Signal 3 received, exiting... (15472): called boinc_finish Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7643, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7643, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7643, iMonCtr=1 Model crash detected, will try to restart... Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=17276, selfPID=17276, iMonCtr=1 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=18043, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=18043, iMonCtr=1 Model crash detected, will try to restart... BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 60 - Return code = 1 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 61 - Return code = 1 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 68 - Return code = 1 BUFFIN: Read Failed: No such file or directory BUFFIN: C I/O Error feof - Unit 69 - Return code = 1 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=18043, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=18043, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=18043, iMonCtr=1 Model crash detected, will try to restart... BUFFIN: Read Failed: Inappropriate ioctl for device BUFFIN: C I/O Error feof - Unit 60 - Return code = 1 BUFFIN: Read Failed: Invalid argument BUFFIN: C I/O Error feof - Unit 61 - Return code = 1 BUFFIN: Read Failed: Invalid argument BUFFIN: C I/O Error feof - Unit 68 - Return code = 1 BUFFIN: Read Failed: Invalid argument BUFFIN: C I/O Error feof - Unit 69 - Return code = 1 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=18043, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( (18043): called boinc_finish </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
16 Jul 2010 16:44:13 | 1085414 | 11569721 | famous_unv5_799_200_006663772_3 | 505,466 | 669,906 | 1.3253 |
16 Jul 2010 11:58:21 | 1085414 | 11569721 | famous_unv5_799_200_006663772_3 | 496,106 | 656,747 | 1.3238 |
16 Jul 2010 05:35:54 | 1085414 | 11569721 | famous_unv5_799_200_006663772_3 | 486,746 | 643,924 | 1.3229 |
16 Jul 2010 01:58:37 | 1085414 | 11569721 | famous_unv5_799_200_006663772_3 | 477,386 | 631,738 | 1.3233 |
15 Jul 2010 23:37:08 | 1085414 | 11569721 | famous_unv5_799_200_006663772_3 | 468,026 | 620,554 | 1.3259 |
15 Jul 2010 02:13:58 | 1085414 | 11569721 | famous_unv5_799_200_006663772_3 | 458,666 | 609,004 | 1.3278 |
14 Jul 2010 18:48:19 | 1085414 | 11569721 | famous_unv5_799_200_006663772_3 | 449,306 | 595,684 | 1.3258 |
14 Jul 2010 15:23:20 | 1085414 | 11569721 | famous_unv5_799_200_006663772_3 | 439,946 | 584,054 | 1.3276 |
14 Jul 2010 11:30:26 | 1085414 | 11569721 | famous_unv5_799_200_006663772_3 | 430,586 | 572,497 | 1.3296 |
14 Jul 2010 08:15:24 | 1085414 | 11569721 | famous_unv5_799_200_006663772_3 | 421,226 | 561,173 | 1.3322 |
14 Jul 2010 05:07:26 | 1085414 | 11569721 | famous_unv5_799_200_006663772_3 | 411,866 | 550,301 | 1.3361 |
14 Jul 2010 02:01:18 | 1085414 | 11569721 | famous_unv5_799_200_006663772_3 | 402,506 | 539,445 | 1.3402 |
13 Jul 2010 23:51:55 | 1085414 | 11569721 | famous_unv5_799_200_006663772_3 | 393,146 | 528,600 | 1.3445 |
13 Jul 2010 19:36:14 | 1085414 | 11569721 | famous_unv5_799_200_006663772_3 | 383,786 | 517,423 | 1.3482 |
13 Jul 2010 16:23:31 | 1085414 | 11569721 | famous_unv5_799_200_006663772_3 | 374,426 | 506,071 | 1.3516 |
13 Jul 2010 12:59:01 | 1085414 | 11569721 | famous_unv5_799_200_006663772_3 | 365,066 | 494,637 | 1.3549 |
13 Jul 2010 09:30:59 | 1085414 | 11569721 | famous_unv5_799_200_006663772_3 | 355,706 | 482,850 | 1.3574 |
13 Jul 2010 06:05:34 | 1085414 | 11569721 | famous_unv5_799_200_006663772_3 | 346,346 | 471,497 | 1.3613 |
13 Jul 2010 02:53:15 | 1085414 | 11569721 | famous_unv5_799_200_006663772_3 | 336,986 | 460,496 | 1.3665 |
12 Jul 2010 23:41:57 | 1085414 | 11569721 | famous_unv5_799_200_006663772_3 | 327,626 | 449,558 | 1.3722 |
©2024 cpdn.org