Task 14032424

Name	hadcm3n_yerr_1980_40_007742911_2
Workunit	7898019
Created	30 Jan 2012, 14:29:38 UTC
Sent	30 Jan 2012, 14:35:00 UTC
Report deadline	30 Apr 2012, 22:02:11 UTC
Received	21 Apr 2012, 11:27:41 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	193 (0x000000C1) EXIT_SIGNAL
Computer ID	985115
Run time	18 days 18 hours 51 min 23 sec
CPU time	18 days 7 hours 28 min 7 sec
Validate state	Invalid
Credit	9,331.20
Device peak FLOPS	1.93 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.0.25</core_client_version> <![CDATA[ <message> - exit code 193 (0xc1) </message> <stderr_txt> CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3324, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3324, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3324, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3324, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3324, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3324, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3324, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3324, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3324, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3324, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3324, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3324, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3324, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3324, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3324, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3324, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3324, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3324, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3324, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3324, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3324, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3324, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3324, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3324, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3324, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3324, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3324, iMonCtr=1 Model crash detected, will try to restart... BUFFIN: C I/O Error feof - Unit 63 - Return code = 16 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 65 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 Error converting file to netcdf: dataout/yerrko.pji3c10 Error converting file to netcdf: dataout/yerrko.pii3c10 Error converting file to netcdf: dataout/yerrko.pfi3c10 Error converting file to netcdf: dataout/yerrka.phi3c10 Error converting file to netcdf: dataout/yerrka.pgi3c10 Error converting file to netcdf: dataout/yerrka.pei3c10 Error converting file to netcdf: dataout/yerrka.pdi3c10 CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... 08:48:54 (5812): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CSuspended CPDN Monitor - Suspend request from BOINC... Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x778F5EAB read attempt to address 0x4088F978 Engaging BOINC Windows Runtime Debugger... Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x77DCA5D5 read attempt to address 0xFFFFFFF8 Engaging BOINC Windows Runtime Debugger... Cannot serialize file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_yerr_1980_40_007742911/dataout/shmem_restart.day Signal 11 received, exiting... Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
20 Apr 2012 18:46:19	985115	14032424	hadcm3n_yerr_1980_40_007742911_2	777,600	1,555,692	2.0006
19 Apr 2012 18:58:19	985115	14032424	hadcm3n_yerr_1980_40_007742911_2	751,680	1,502,860	1.9993
18 Apr 2012 13:29:18	985115	14032424	hadcm3n_yerr_1980_40_007742911_2	725,760	1,452,375	2.0012
16 Apr 2012 19:57:54	985115	14032424	hadcm3n_yerr_1980_40_007742911_2	699,840	1,402,010	2.0033
15 Apr 2012 15:51:44	985115	14032424	hadcm3n_yerr_1980_40_007742911_2	673,920	1,351,463	2.0054
14 Apr 2012 17:46:20	985115	14032424	hadcm3n_yerr_1980_40_007742911_2	648,000	1,300,419	2.0068
13 Apr 2012 21:02:13	985115	14032424	hadcm3n_yerr_1980_40_007742911_2	622,080	1,249,776	2.0090
11 Apr 2012 22:38:11	985115	14032424	hadcm3n_yerr_1980_40_007742911_2	596,160	1,198,367	2.0101
11 Apr 2012 08:27:43	985115	14032424	hadcm3n_yerr_1980_40_007742911_2	570,240	1,147,540	2.0124
10 Apr 2012 12:12:20	985115	14032424	hadcm3n_yerr_1980_40_007742911_2	544,320	1,096,120	2.0137
08 Apr 2012 22:17:52	985115	14032424	hadcm3n_yerr_1980_40_007742911_2	518,400	1,046,141	2.0180
07 Apr 2012 22:01:49	985115	14032424	hadcm3n_yerr_1980_40_007742911_2	492,480	995,496	2.0214
07 Apr 2012 00:53:31	985115	14032424	hadcm3n_yerr_1980_40_007742911_2	466,560	944,208	2.0238
05 Apr 2012 18:23:32	985115	14032424	hadcm3n_yerr_1980_40_007742911_2	440,640	892,512	2.0255
04 Apr 2012 12:05:49	985115	14032424	hadcm3n_yerr_1980_40_007742911_2	414,720	841,662	2.0295
02 Apr 2012 11:46:35	985115	14032424	hadcm3n_yerr_1980_40_007742911_2	388,800	789,730	2.0312
28 Mar 2012 18:14:31	985115	14032424	hadcm3n_yerr_1980_40_007742911_2	362,880	738,549	2.0352
21 Mar 2012 19:27:16	985115	14032424	hadcm3n_yerr_1980_40_007742911_2	336,960	686,146	2.0363
20 Mar 2012 21:05:15	985115	14032424	hadcm3n_yerr_1980_40_007742911_2	311,040	635,332	2.0426
20 Mar 2012 00:09:13	985115	14032424	hadcm3n_yerr_1980_40_007742911_2	285,120	584,712	2.0508