Task 13389049

Name	hadcm3n_o79r_1940_40_007448670_4
Workunit	7646173
Created	15 Sep 2011, 7:17:53 UTC
Sent	15 Sep 2011, 7:24:47 UTC
Report deadline	15 Dec 2011, 14:51:58 UTC
Received	12 Mar 2012, 21:09:12 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	-1073741819 (0xC0000005) STATUS_ACCESS_VIOLATION
Computer ID	1072792
Run time	34 days 9 hours 51 min 41 sec
CPU time	34 days 9 hours 51 min 41 sec
Validate state	Invalid
Credit	12,441.60
Device peak FLOPS	2.02 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>6.10.18</core_client_version> <![CDATA[ <message> - exit code -1073741819 (0xc0000005) </message> <stderr_txt> Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=564, iMonCtr=1 Model crash detected, will try to restart... 09:36:43 (1564): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 12:40:32 (540): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 12:28:24 (1808): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 11:03:41 (2032): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... BUFFIN: C I/O Error feof - Unit 63 - Return code = 16 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 65 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 Error converting file to netcdf: dataout/o79rko.pje5c10 Error converting file to netcdf: dataout/o79rko.pie5c10 Error converting file to netcdf: dataout/o79rko.pfe5c10 Error converting file to netcdf: dataout/o79rka.phe5c10 Error converting file to netcdf: dataout/o79rka.pge5c10 Error converting file to netcdf: dataout/o79rka.pee5c10 Error converting file to netcdf: dataout/o79rka.pde5c10 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2568, iMonCtr=1 Model crash detected, will try to restart... 10:44:26 (1992): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 12:29:53 (1664): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=964, iMonCtr=1 Model crash detected, will try to restart... 09:17:17 (972): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3376, iMonCtr=1 Model crash detected, will try to restart... 10:40:52 (1888): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... C11:07:07 (1336): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CCPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4020, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1992, iMonCtr=1 Model crash detected, will try to restart... 10:56:07 (1708): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3368, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2896, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1656, iMonCtr=1 Model crash detected, will try to restart... 12:07:07 (1160): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 12:51:09 (2380): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 11:51:06 (1640): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 12:46:06 (3800): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 12:56:07 (1488): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 12:43:14 (1372): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 12:18:18 (1312): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 12:40:05 (1284): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2900, iMonCtr=1 Model crash detected, will try to restart... 12:33:53 (3896): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1680, iMonCtr=1 Model crash detected, will try to restart... 10:43:29 (1804): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2340, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1172, iMonCtr=1 Model crash detected, will try to restart... 10:14:21 (1296): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CCPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1868, iMonCtr=1 Model crash detected, will try to restart... 12:32:26 (1824): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1464, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... 09:26:00 (1556): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 12:30:21 (1560): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 12:36:42 (2016): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x76F17373 read attempt to address 0xFFFFFFF8 Engaging BOINC Windows Runtime Debugger... Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x77A73AB3 read attempt to address 0x00000000 Engaging BOINC Windows Runtime Debugger... </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
12 Mar 2012 06:48:50	1072792	13389049	hadcm3n_o79r_1940_40_007448670_4	1,036,800	2,973,161	2.8676
09 Mar 2012 14:49:25	1072792	13389049	hadcm3n_o79r_1940_40_007448670_4	1,010,880	2,881,921	2.8509
04 Mar 2012 13:18:23	1072792	13389049	hadcm3n_o79r_1940_40_007448670_4	984,960	2,802,958	2.8458
27 Feb 2012 09:27:26	1072792	13389049	hadcm3n_o79r_1940_40_007448670_4	959,040	2,718,452	2.8346
24 Feb 2012 17:39:57	1072792	13389049	hadcm3n_o79r_1940_40_007448670_4	933,120	2,652,183	2.8423
19 Feb 2012 12:48:24	1072792	13389049	hadcm3n_o79r_1940_40_007448670_4	907,200	2,586,215	2.8508
17 Feb 2012 14:11:40	1072792	13389049	hadcm3n_o79r_1940_40_007448670_4	881,280	2,512,934	2.8515
12 Feb 2012 13:13:22	1072792	13389049	hadcm3n_o79r_1940_40_007448670_4	855,360	2,441,200	2.8540
05 Feb 2012 15:31:44	1072792	13389049	hadcm3n_o79r_1940_40_007448670_4	829,440	2,370,419	2.8579
01 Feb 2012 19:00:06	1072792	13389049	hadcm3n_o79r_1940_40_007448670_4	803,520	2,296,822	2.8585
28 Jan 2012 18:35:14	1072792	13389049	hadcm3n_o79r_1940_40_007448670_4	777,600	2,225,283	2.8617
22 Jan 2012 12:11:51	1072792	13389049	hadcm3n_o79r_1940_40_007448670_4	751,680	2,154,100	2.8657
15 Jan 2012 15:41:52	1072792	13389049	hadcm3n_o79r_1940_40_007448670_4	725,760	2,082,555	2.8695
11 Jan 2012 09:04:01	1072792	13389049	hadcm3n_o79r_1940_40_007448670_4	699,840	2,008,956	2.8706
07 Jan 2012 14:53:30	1072792	13389049	hadcm3n_o79r_1940_40_007448670_4	673,920	1,935,818	2.8725
01 Jan 2012 14:08:35	1072792	13389049	hadcm3n_o79r_1940_40_007448670_4	648,000	1,864,415	2.8772
27 Dec 2011 15:28:59	1072792	13389049	hadcm3n_o79r_1940_40_007448670_4	622,080	1,793,156	2.8825
25 Dec 2011 11:09:26	1072792	13389049	hadcm3n_o79r_1940_40_007448670_4	596,160	1,714,595	2.8761
15 Dec 2011 16:33:01	1072792	13389049	hadcm3n_o79r_1940_40_007448670_4	570,240	1,643,635	2.8824
13 Dec 2011 14:17:56	1072792	13389049	hadcm3n_o79r_1940_40_007448670_4	544,320	1,566,701	2.8783