Task 15770943

Name	hadcm3n_4hom_1940_40_008311069_1
Workunit	8462204
Created	10 May 2013, 10:49:14 UTC
Sent	10 May 2013, 10:49:32 UTC
Report deadline	9 Aug 2013, 18:16:43 UTC
Received	4 Jun 2013, 20:21:07 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1275849
Run time	15 days 19 hours 48 min 44 sec
CPU time	15 days 3 hours 17 min 47 sec
Validate state	Invalid
Credit	5,287.68
Device peak FLOPS	2.75 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.0.64</core_client_version> <![CDATA[ <message> Le périphérique ne reconnaît pas la commande. (0x16) - exit code 22 (0x16) </message> <stderr_txt> CPDN Monitor - Quit request from BOINC... BUFFIN: C I/O Error feof - Unit 63 - Return code = 16 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 65 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 Error converting file to netcdf: dataout/4homko.pje2c10 Error converting file to netcdf: dataout/4homko.pie2c10 Error converting file to netcdf: dataout/4homko.pfe2c10 Error converting file to netcdf: dataout/4homka.phe2c10 Error converting file to netcdf: dataout/4homka.pge2c10 Error converting file to netcdf: dataout/4homka.pee2c10 Error converting file to netcdf: dataout/4homka.pde2c10 CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 17:41:12 (1604): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 17:41:13 (1604): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... 01:42:23 (6716): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3692, iMonCtr=1 Model crash detected, will try to restart... 19:22:06 (2784): No heartbeat from core client for 30 sec - exiting 19:22:07 (2784): No heartbeat from core client for 30 sec - exiting 19:22:08 (2784): No heartbeat from core client for 30 sec - exiting 19:22:09 (2784): No heartbeat from core client for 30 sec - exiting 19:22:10 (2784): No heartbeat from core client for 30 sec - exiting 19:22:11 (2784): No heartbeat from core client for 30 sec - exiting 19:22:12 (2784): No heartbeat from core client for 30 sec - exiting 19:22:13 (2784): No heartbeat from core client for 30 sec - exiting 19:22:14 (2784): No heartbeat from core client for 30 sec - exiting 19:22:16 (2784): No heartbeat from core client for 30 sec - exiting 19:22:17 (2784): No heartbeat from core client for 30 sec - exiting 19:22:18 (2784): No heartbeat from core client for 30 sec - exiting 19:22:19 (2784): No heartbeat from core client for 30 sec - exiting 19:22:20 (2784): No heartbeat from core client for 30 sec - exiting 19:22:21 (2784): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... 08:03:26 (7368): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:33:46 (4656): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7260, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... 08:33:16 (3232): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 10:33:34 (1964): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:32:51 (7632): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 11:32:52 (7632): No heartbeat from core client for 30 sec - exiting 11:32:53 (7632): No heartbeat from core client for 30 sec - exiting 11:32:54 (7632): No heartbeat from core client for 30 sec - exiting 15:35:51 (8052): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 18:36:25 (9920): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 07:54:53 (5448): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... 08:08:37 (7512): No heartbeat from core client for 30 sec - exiting 08:08:38 (7512): No heartbeat from core client for 30 sec - exiting 08:08:39 (7512): No heartbeat from core client for 30 sec - exiting 08:08:40 (7512): No heartbeat from core client for 30 sec - exiting 08:08:41 (7512): No heartbeat from core client for 30 sec - exiting 08:08:42 (7512): No heartbeat from core client for 30 sec - exiting 08:08:43 (7512): No heartbeat from core client for 30 sec - exiting 08:08:44 (7512): No heartbeat from core client for 30 sec - exiting 08:08:45 (7512): No heartbeat from core client for 30 sec - exiting 08:08:46 (7512): No heartbeat from core client for 30 sec - exiting 08:08:47 (7512): No heartbeat from core client for 30 sec - exiting 08:08:48 (7512): No heartbeat from core client for 30 sec - exiting 08:08:49 (7512): No heartbeat from core client for 30 sec - exiting 08:08:50 (7512): No heartbeat from core client for 30 sec - exiting 08:08:51 (7512): No heartbeat from core client for 30 sec - exiting 08:08:52 (7512): No heartbeat from core client for 30 sec - exiting 08:08:53 (7512): No heartbeat from core client for 30 sec - exiting 08:08:54 (7512): No heartbeat from core client for 30 sec - exiting 08:08:55 (7512): No heartbeat from core client for 30 sec - exiting 08:08:56 (7512): No heartbeat from core client for 30 sec - exiting 08:08:57 (7512): No heartbeat from core client for 30 sec - exiting 08:08:58 (7512): No heartbeat from core client for 30 sec - exiting 08:08:59 (7512): No heartbeat from core client for 30 sec - exiting 08:09:00 (7512): No heartbeat from core client for 30 sec - exiting 08:09:01 (7512): No heartbeat from core client for 30 sec - exiting 08:09:02 (7512): No heartbeat from core client for 30 sec - exiting 08:09:03 (7512): No heartbeat from core client for 30 sec - exiting 08:09:04 (7512): No heartbeat from core client for 30 sec - exiting 08:09:05 (7512): No heartbeat from core client for 30 sec - exiting 08:09:06 (7512): No heartbeat from core client for 30 sec - exiting 08:09:07 (7512): No heartbeat from core client for 30 sec - exiting 08:09:08 (7512): No heartbeat from core client for 30 sec - exiting 08:09:09 (7512): No heartbeat from core client for 30 sec - exiting 08:09:11 (7512): No heartbeat from core client for 30 sec - exiting 08:09:12 (7512): No heartbeat from core client for 30 sec - exiting 08:09:13 (7512): No heartbeat from core client for 30 sec - exiting 08:09:14 (7512): No heartbeat from core client for 30 sec - exiting 08:09:15 (7512): No heartbeat from core client for 30 sec - exiting 08:09:16 (7512): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 08:09:17 (7512): No heartbeat from core client for 30 sec - exiting 08:09:18 (7512): No heartbeat from core client for 30 sec - exiting Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7968, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7968, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7968, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7968, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7968, iMonCtr=1 Model crash detected, will try to restart... Signal 22 received, exiting... Called boinc_finish Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7968, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... 02:59:36 (11984): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 02:59:39 (11984): No heartbeat from core client for 30 sec - exiting 02:59:40 (11984): No heartbeat from core client for 30 sec - exiting 07:59:36 (16284): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 10:59:36 (17036): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 10:59:38 (17036): No heartbeat from core client for 30 sec - exiting 10:59:39 (17036): No heartbeat from core client for 30 sec - exiting 10:59:40 (17036): No heartbeat from core client for 30 sec - exiting 13:59:31 (15964): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 05:59:34 (17676): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 06:59:37 (20176): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 07:59:32 (19012): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:59:32 (20224): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:59:34 (20224): No heartbeat from core client for 30 sec - exiting 23:59:35 (20224): No heartbeat from core client for 30 sec - exiting CPDN Monitor - Quit request from BOINC... 18:05:54 (6644): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5860, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5860, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5860, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5860, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5860, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5860, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_4hom_1940_40_008311069/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_4hom_1940_40_008311069/dataout/ocean_restart.day after 11 attempts Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5888, iMonCtr=1 Model crash detected, will try to restart... cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_4hom_1940_40_008311069/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_4hom_1940_40_008311069/dataout/ocean_restart.day after 11 attempts Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5888, iMonCtr=1 Model crash detected, will try to restart... cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_4hom_1940_40_008311069/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_4hom_1940_40_008311069/dataout/ocean_restart.day after 11 attempts Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5888, iMonCtr=1 Model crash detected, will try to restart... cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_4hom_1940_40_008311069/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_4hom_1940_40_008311069/dataout/ocean_restart.day after 11 attempts Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5888, iMonCtr=1 Model crash detected, will try to restart... cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_4hom_1940_40_008311069/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_4hom_1940_40_008311069/dataout/ocean_restart.day after 11 attempts Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5888, iMonCtr=1 Model crash detected, will try to restart... cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_4hom_1940_40_008311069/dataout/atmos_restart.day after 11 attempts cpdnmonitor: cannot open input file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_4hom_1940_40_008311069/dataout/ocean_restart.day after 11 attempts Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5888, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
03 Jun 2013 09:54:32	1275849	15770943	hadcm3n_4hom_1940_40_008311069_1	440,640	1,283,386	2.9125
22 May 2013 09:33:38	1275849	15770943	hadcm3n_4hom_1940_40_008311069_1	414,720	606,957	1.4635
21 May 2013 12:59:33	1275849	15770943	hadcm3n_4hom_1940_40_008311069_1	388,800	569,371	1.4644
20 May 2013 13:22:31	1275849	15770943	hadcm3n_4hom_1940_40_008311069_1	362,880	531,999	1.4660
19 May 2013 08:22:03	1275849	15770943	hadcm3n_4hom_1940_40_008311069_1	336,960	496,002	1.4720
18 May 2013 21:53:00	1275849	15770943	hadcm3n_4hom_1940_40_008311069_1	311,040	458,432	1.4739
18 May 2013 08:28:28	1275849	15770943	hadcm3n_4hom_1940_40_008311069_1	285,120	419,302	1.4706
17 May 2013 21:49:05	1275849	15770943	hadcm3n_4hom_1940_40_008311069_1	259,200	381,280	1.4710
17 May 2013 10:26:03	1275849	15770943	hadcm3n_4hom_1940_40_008311069_1	233,280	341,817	1.4653
16 May 2013 23:46:46	1275849	15770943	hadcm3n_4hom_1940_40_008311069_1	207,360	303,642	1.4643
15 May 2013 15:07:45	1275849	15770943	hadcm3n_4hom_1940_40_008311069_1	181,440	264,893	1.4599
15 May 2013 05:04:14	1275849	15770943	hadcm3n_4hom_1940_40_008311069_1	155,520	227,142	1.4605
14 May 2013 10:04:14	1275849	15770943	hadcm3n_4hom_1940_40_008311069_1	129,600	189,675	1.4635
13 May 2013 01:24:47	1275849	15770943	hadcm3n_4hom_1940_40_008311069_1	103,680	151,943	1.4655
12 May 2013 13:42:45	1275849	15770943	hadcm3n_4hom_1940_40_008311069_1	77,760	113,249	1.4564
11 May 2013 08:58:08	1275849	15770943	hadcm3n_4hom_1940_40_008311069_1	51,840	74,916	1.4451
10 May 2013 21:31:32	1275849	15770943	hadcm3n_4hom_1940_40_008311069_1	25,920	37,446	1.4447