Name | hadam4h_a286_201511_4_842_011907342_2 |
Workunit | 11907342 |
Created | 23 Dec 2019, 0:18:43 UTC |
Sent | 23 Dec 2019, 0:29:58 UTC |
Report deadline | 4 Dec 2020, 5:49:58 UTC |
Received | 15 Jan 2020, 2:52:41 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 12 (0x0000000C) Unknown error code |
Computer ID | 1492959 |
Run time | 12 days 0 hours 4 min 2 sec |
CPU time | 11 days 23 hours 15 min 33 sec |
Validate state | Invalid |
Credit | 13,636.74 |
Device peak FLOPS | 3.22 GFLOPS |
Application version | UK Met Office HadAM4 at N216 resolution v8.52 i686-pc-linux-gnu |
Peak working set size | 1,364.38 MB |
Peak swap size | 1,385.77 MB |
Peak disk usage | 12.91 MB |
Stderr | <core_client_version>7.9.3</core_client_version> <![CDATA[ <message> process exited with code 12 (0xc, -244)</message> <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Signal 15 received: Software termination signal from kill Signal 15 received: Abnormal termination triggered by abort call Signal 15 received, exiting... 15:19:42 (16435): called boinc_finish(193) Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... BUFFOUT: Write Failed: No space left on device BUFFOUT: C I/O Error - Return code = 1 Model crashed: WRITDUMP: BAD BUFFOUT OF DATA tmp/xnnuj.pipe_dummy Signal 15 received: Software termination signal from kill Signal 15 received: Abnormal termination triggered by abort call Signal 15 received, exiting... 16:38:47 (12201): called boinc_finish(193) Signal 15 received: Software termination signal from kill Signal 15 received: Abnormal termination triggered by abort call Signal 15 received, exiting... SIGSEGV: segmentation violation Stack trace (17 frames): ../../projects/climateprediction.net/hadam4_8.52_i686-pc-linux-gnu(boinc_catch_signal+0x67)[0x80d4cf7] linux-gate.so.1(__kernel_sigreturn+0x0)[0xf7f21090] /lib32/libc.so.6(getenv+0x99)[0xf7ac75f9] /lib32/libc.so.6(+0xae498)[0xf7b46498] /lib32/libc.so.6(+0xae865)[0xf7b46865] /lib32/libc.so.6(localtime_r+0x12)[0xf7b44dc2] ../../projects/climateprediction.net/hadam4_8.52_i686-pc-linux-gnu[0x80d01b2] ../../projects/climateprediction.net/hadam4_8.52_i686-pc-linux-gnu[0x80d0900] ../../projects/climateprediction.net/hadam4_8.52_i686-pc-linux-gnu[0x80d09f1] linux-gate.so.1(__kernel_sigreturn+0x0)[0xf7f21090] linux-gate.so.1(__kernel_vsyscall+0x9)[0xf7f21079] /lib32/libc.so.6(nanosleep+0x4b)[0xf7b55f6b] /lib32/libc.so.6(usleep+0x41)[0xf7b881a1] ../../projects/climateprediction.net/hadam4_8.52_i686-pc-linux-gnu[0x80e78a5] ../../projects/climateprediction.net/hadam4_8.52_i686-pc-linux-gnu[0x80d2114] /lib32/libpthread.so.0(+0x63a6)[0xf7eeb3a6] /lib32/libc.so.6(clone+0x66)[0xf7b8f396] Exiting... OPEN: File Creation Failed: No such file or directory OPEN: Unable to Open File dataout/a286ga.pbl6jan for Read/Write Model crashed: STWORK : Error opening output PP file on unit 61 tmp/xnnuj.pipe_dummy cpdnmonitor: cannot open input file /var/lib/boinc-client/projects/climateprediction.net/hadam4h_a286_201511_4_842_011907342/jobs/xnnuj.ihist after 11 attempts cpdnmonitor: cannot open input file /var/lib/boinc-client/projects/climateprediction.net/hadam4h_a286_201511_4_842_011907342/jobs/xnnuj.namelists after 11 attempts cpdnmonitor: cannot open input file /var/lib/boinc-client/projects/climateprediction.net/hadam4h_a286_201511_4_842_011907342/dataout/atmos_restart.day after 11 attempts forrtl: Bad file descriptor forrtl: severe (30): open failure, unit 6, file /proc/7116/fd/ Image PC Routine Line Source hadam4_um_8.52_i6 083F6605 Unknown Unknown Unknown hadam4_um_8.52_i6 0843E0A0 Unknown Unknown Unknown hadam4_um_8.52_i6 081DF6C9 Unknown Unknown Unknown hadam4_um_8.52_i6 0836E63F Unknown Unknown Unknown Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2230, iMonCtr=1 Model crash detected, will try to restart... cpdnmonitor: cannot open input file /var/lib/boinc-client/projects/climateprediction.net/hadam4h_a286_201511_4_842_011907342/jobs/xnnuj.ihist after 11 attempts cpdnmonitor: cannot open input file /var/lib/boinc-client/projects/climateprediction.net/hadam4h_a286_201511_4_842_011907342/jobs/xnnuj.namelists after 11 attempts cpdnmonitor: cannot open input file /var/lib/boinc-client/projects/climateprediction.net/hadam4h_a286_201511_4_842_011907342/dataout/atmos_restart.day after 11 attempts forrtl: Bad file descriptor forrtl: severe (30): open failure, unit 6, file /proc/7125/fd/ Image PC Routine Line Source hadam4_um_8.52_i6 083F6605 Unknown Unknown Unknown hadam4_um_8.52_i6 0843E0A0 Unknown Unknown Unknown hadam4_um_8.52_i6 081DF6C9 Unknown Unknown Unknown hadam4_um_8.52_i6 0836E63F Unknown Unknown Unknown Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2230, iMonCtr=1 Model crash detected, will try to restart... cpdnmonitor: cannot open input file /var/lib/boinc-client/projects/climateprediction.net/hadam4h_a286_201511_4_842_011907342/jobs/xnnuj.ihist after 11 attempts cpdnmonitor: cannot open input file /var/lib/boinc-client/projects/climateprediction.net/hadam4h_a286_201511_4_842_011907342/jobs/xnnuj.namelists after 11 attempts cpdnmonitor: cannot open input file /var/lib/boinc-client/projects/climateprediction.net/hadam4h_a286_201511_4_842_011907342/dataout/atmos_restart.day after 11 attempts forrtl: Bad file descriptor forrtl: severe (30): open failure, unit 6, file /proc/7135/fd/ Image PC Routine Line Source hadam4_um_8.52_i6 083F6605 Unknown Unknown Unknown hadam4_um_8.52_i6 0843E0A0 Unknown Unknown Unknown hadam4_um_8.52_i6 081DF6C9 Unknown Unknown Unknown hadam4_um_8.52_i6 0836E63F Unknown Unknown Unknown Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2230, iMonCtr=1 Model crash detected, will try to restart... cpdnmonitor: cannot open input file /var/lib/boinc-client/projects/climateprediction.net/hadam4h_a286_201511_4_842_011907342/jobs/xnnuj.ihist after 11 attempts cpdnmonitor: cannot open input file /var/lib/boinc-client/projects/climateprediction.net/hadam4h_a286_201511_4_842_011907342/jobs/xnnuj.namelists after 11 attempts cpdnmonitor: cannot open input file /var/lib/boinc-client/projects/climateprediction.net/hadam4h_a286_201511_4_842_011907342/dataout/atmos_restart.day after 11 attempts forrtl: Bad file descriptor forrtl: severe (30): open failure, unit 6, file /proc/7141/fd/ Image PC Routine Line Source hadam4_um_8.52_i6 083F6605 Unknown Unknown Unknown hadam4_um_8.52_i6 0843E0A0 Unknown Unknown Unknown hadam4_um_8.52_i6 081DF6C9 Unknown Unknown Unknown hadam4_um_8.52_i6 0836E63F Unknown Unknown Unknown Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2230, iMonCtr=1 Model crash detected, will try to restart... cpdnmonitor: cannot open input file /var/lib/boinc-client/projects/climateprediction.net/hadam4h_a286_201511_4_842_011907342/jobs/xnnuj.ihist after 11 attempts cpdnmonitor: cannot open input file /var/lib/boinc-client/projects/climateprediction.net/hadam4h_a286_201511_4_842_011907342/jobs/xnnuj.namelists after 11 attempts cpdnmonitor: cannot open input file /var/lib/boinc-client/projects/climateprediction.net/hadam4h_a286_201511_4_842_011907342/dataout/atmos_restart.day after 11 attempts forrtl: Bad file descriptor forrtl: severe (30): open failure, unit 6, file /proc/7151/fd/ Image PC Routine Line Source hadam4_um_8.52_i6 083F6605 Unknown Unknown Unknown hadam4_um_8.52_i6 0843E0A0 Unknown Unknown Unknown hadam4_um_8.52_i6 081DF6C9 Unknown Unknown Unknown hadam4_um_8.52_i6 0836E63F Unknown Unknown Unknown Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2230, iMonCtr=1 Model crash detected, will try to restart... Sorry, too many model crashes! :-( </stderr_txt> ]]> |
Latest Trickles Received | ||||||
---|---|---|---|---|---|---|
Time Sent (UTC) | Host ID | Result ID | Result Name | Timestep | CPU Time (sec) | Average (sec/TS) |
11 Jan 2020 17:30:14 | 1492959 | 21861555 | hadam4h_a286_201511_4_842_011907342_2 | 17,483 | 741,399 | 42.4069 |
06 Jan 2020 21:05:25 | 1492959 | 21861555 | hadam4h_a286_201511_4_842_011907342_2 | 8,843 | 390,463 | 44.1550 |
©2024 cpdn.org