Discussion:
DB2 Express-C 10.5 on Ubuntu 16.04: Frequent crashes
(too old to reply)
Joachim Tuchel
2017-11-23 12:45:34 UTC
Permalink
We've been seeing frequent crashes of DB2 on Ubuntu Linux. By frequent I mean at least daily. Client Applications get Communications Errors

[SQLSTATE=40003 - [IBM][CLI Driver] SQL1224N The database manager is not able to accept new requests, has terminated all requests in progress, or has terminated the specified request because of an error or a forced interrupt. SQLSTATE=55032 [Native Error=-1224]

And after a few attempts clients get

[SQLSTATE=08003 - [IBM][CLI Driver] CLI0106E Connection is closed. SQLSTATE=08003 [Native Error=-99999]

And a few attempts later they get "No Start Database Manager command was issued"

I can simply do a db2 start dbm and things start to work again, but I never know for how long.


I've tried to find some useful information in the db2diag.log. The first error is usually this:

EDUID : 111 EDUNAME: db2agent (instance)
FUNCTION: DB2 UDB, common communication, sqlccipcAgentExitList, probe:2
DATA #1 : Hexdump, 24 bytes
0x000000020457E784 : 6680 C901 0300 0000 0200 0000 FFFF FFFF f...............
0x000000020457E794 : 0200 0000 FBFF FFFF ........

2017-11-23-13.14.47.472886+060 I8475107E2236 LEVEL: Severe (OS)
PID : 1633 TID : 140263188064000 PROC : db2sysc
INSTANCE: NODE : 000 DB :
APPHDL : 0-25 APPID: *LOCAL.DB2.171123033522
AUTHID : HOSTNAME:
EDUID : 81 EDUNAME: db2fw0 (KONTO1)
FUNCTION: DB2 UDB, oper system services, sqloWaitEDUWaitPost, probe:100
MESSAGE : ZRC=0x8300002B=-2097151957


Followed by hundreds of other errors, like for example this one:

MESSAGE : Unexpected OS error. This most likely means that resources have been
torn down from underneath the prefetcher. Terminating the prefetcher
to prevent infinite looping.
CALLSTCK: (Static functions may not be resolved correctly, as they are resolved to the nearest symbol)
[0] 0x00007F91A8A67B48 _Z17sqlbpfRemoveFromQP12SQLB_pfQUEUEmPP14SQLB_pfRequestP16sqeLocalDatabasePP12SQLD_OBJ_HDLP12SQLB_GLOBALS + 0x3C8
[1] 0x00007F91A8940957 _Z26sqlbPFPrefetcherEntryPointP16sqbPrefetcherEdu + 0x197
[2] 0x00007F91A8940725 _ZN16sqbPrefetcherEdu6RunEDUEv + 0x25
[3] 0x00007F91AC0A7047 _ZN9sqzEDUObj9EDUDriverEv + 0xF7
[4] 0x00007F91AB880181 sqloEDUEntry + 0x301
[5] 0x00007F91B2AF86BA /lib/x86_64-linux-gnu/libpthread.so.0 + 0x76BA
[6] 0x00007F91A650F3DD clone + 0x6D




So what can I look for/after in order to find the cause of this problem? I've read about the Linux Kernel parameters, but I think they are okay.

Any hints?


Joachim
Jeremy Rickard
2017-11-27 19:02:47 UTC
Permalink
Google "sqloWaitEDUWaitPost" for a few possibilities?

Might not hurt to check the ulimit values against recommendations at https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.5.0/com.ibm.db2.luw.qb.server.doc/doc/r0052441.html

Is it a root or non-root installation?

Before you do the new 11.1 install suggest you also run db2prereqcheck, e.g. ./db2prereqcheck -v 11.1.2.2
Post by Joachim Tuchel
We've been seeing frequent crashes of DB2 on Ubuntu Linux. By frequent I mean at least daily. Client Applications get Communications Errors
[SQLSTATE=40003 - [IBM][CLI Driver] SQL1224N The database manager is not able to accept new requests, has terminated all requests in progress, or has terminated the specified request because of an error or a forced interrupt. SQLSTATE=55032 [Native Error=-1224]
And after a few attempts clients get
[SQLSTATE=08003 - [IBM][CLI Driver] CLI0106E Connection is closed. SQLSTATE=08003 [Native Error=-99999]
And a few attempts later they get "No Start Database Manager command was issued"
I can simply do a db2 start dbm and things start to work again, but I never know for how long.
EDUID : 111 EDUNAME: db2agent (instance)
FUNCTION: DB2 UDB, common communication, sqlccipcAgentExitList, probe:2
DATA #1 : Hexdump, 24 bytes
0x000000020457E784 : 6680 C901 0300 0000 0200 0000 FFFF FFFF f...............
0x000000020457E794 : 0200 0000 FBFF FFFF ........
2017-11-23-13.14.47.472886+060 I8475107E2236 LEVEL: Severe (OS)
PID : 1633 TID : 140263188064000 PROC : db2sysc
APPHDL : 0-25 APPID: *LOCAL.DB2.171123033522
EDUID : 81 EDUNAME: db2fw0 (KONTO1)
FUNCTION: DB2 UDB, oper system services, sqloWaitEDUWaitPost, probe:100
MESSAGE : ZRC=0x8300002B=-2097151957
MESSAGE : Unexpected OS error. This most likely means that resources have been
torn down from underneath the prefetcher. Terminating the prefetcher
to prevent infinite looping.
CALLSTCK: (Static functions may not be resolved correctly, as they are resolved to the nearest symbol)
[0] 0x00007F91A8A67B48 _Z17sqlbpfRemoveFromQP12SQLB_pfQUEUEmPP14SQLB_pfRequestP16sqeLocalDatabasePP12SQLD_OBJ_HDLP12SQLB_GLOBALS + 0x3C8
[1] 0x00007F91A8940957 _Z26sqlbPFPrefetcherEntryPointP16sqbPrefetcherEdu + 0x197
[2] 0x00007F91A8940725 _ZN16sqbPrefetcherEdu6RunEDUEv + 0x25
[3] 0x00007F91AC0A7047 _ZN9sqzEDUObj9EDUDriverEv + 0xF7
[4] 0x00007F91AB880181 sqloEDUEntry + 0x301
[5] 0x00007F91B2AF86BA /lib/x86_64-linux-gnu/libpthread.so.0 + 0x76BA
[6] 0x00007F91A650F3DD clone + 0x6D
So what can I look for/after in order to find the cause of this problem? I've read about the Linux Kernel parameters, but I think they are okay.
Any hints?
Joachim
Joachim Tuchel
2017-11-28 18:57:32 UTC
Permalink
Jeremy,


I tried googling a few expressions, but not sqloWaitEDUWaitPost... Will do so tonight. Thanks for the suggestion.

The ulimit values seem fine.

I did the db2prereqcheck for 11.1 and it has nothing to complain about... In the meantime, I have upgraded to 11.1 in the hopes the situation changes. The upgrade process wasn't as smooth as expected, but all is up and running now. It's too early to tell if 11.1 magically removed the crashes...


Joachim
Post by Jeremy Rickard
Google "sqloWaitEDUWaitPost" for a few possibilities?
Might not hurt to check the ulimit values against recommendations at https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.5.0/com.ibm.db2.luw.qb.server.doc/doc/r0052441.html
Is it a root or non-root installation?
Before you do the new 11.1 install suggest you also run db2prereqcheck, e.g. ./db2prereqcheck -v 11.1.2.2
Post by Joachim Tuchel
We've been seeing frequent crashes of DB2 on Ubuntu Linux. By frequent I mean at least daily. Client Applications get Communications Errors
[SQLSTATE=40003 - [IBM][CLI Driver] SQL1224N The database manager is not able to accept new requests, has terminated all requests in progress, or has terminated the specified request because of an error or a forced interrupt. SQLSTATE=55032 [Native Error=-1224]
And after a few attempts clients get
[SQLSTATE=08003 - [IBM][CLI Driver] CLI0106E Connection is closed. SQLSTATE=08003 [Native Error=-99999]
And a few attempts later they get "No Start Database Manager command was issued"
I can simply do a db2 start dbm and things start to work again, but I never know for how long.
EDUID : 111 EDUNAME: db2agent (instance)
FUNCTION: DB2 UDB, common communication, sqlccipcAgentExitList, probe:2
DATA #1 : Hexdump, 24 bytes
0x000000020457E784 : 6680 C901 0300 0000 0200 0000 FFFF FFFF f...............
0x000000020457E794 : 0200 0000 FBFF FFFF ........
2017-11-23-13.14.47.472886+060 I8475107E2236 LEVEL: Severe (OS)
PID : 1633 TID : 140263188064000 PROC : db2sysc
APPHDL : 0-25 APPID: *LOCAL.DB2.171123033522
EDUID : 81 EDUNAME: db2fw0 (KONTO1)
FUNCTION: DB2 UDB, oper system services, sqloWaitEDUWaitPost, probe:100
MESSAGE : ZRC=0x8300002B=-2097151957
MESSAGE : Unexpected OS error. This most likely means that resources have been
torn down from underneath the prefetcher. Terminating the prefetcher
to prevent infinite looping.
CALLSTCK: (Static functions may not be resolved correctly, as they are resolved to the nearest symbol)
[0] 0x00007F91A8A67B48 _Z17sqlbpfRemoveFromQP12SQLB_pfQUEUEmPP14SQLB_pfRequestP16sqeLocalDatabasePP12SQLD_OBJ_HDLP12SQLB_GLOBALS + 0x3C8
[1] 0x00007F91A8940957 _Z26sqlbPFPrefetcherEntryPointP16sqbPrefetcherEdu + 0x197
[2] 0x00007F91A8940725 _ZN16sqbPrefetcherEdu6RunEDUEv + 0x25
[3] 0x00007F91AC0A7047 _ZN9sqzEDUObj9EDUDriverEv + 0xF7
[4] 0x00007F91AB880181 sqloEDUEntry + 0x301
[5] 0x00007F91B2AF86BA /lib/x86_64-linux-gnu/libpthread.so.0 + 0x76BA
[6] 0x00007F91A650F3DD clone + 0x6D
So what can I look for/after in order to find the cause of this problem? I've read about the Linux Kernel parameters, but I think they are okay.
Any hints?
Joachim
Joachim Tuchel
2017-11-28 18:58:23 UTC
Permalink
Oh, and it is a root installation...

Loading...