Joachim Tuchel
2017-11-23 12:45:34 UTC
We've been seeing frequent crashes of DB2 on Ubuntu Linux. By frequent I mean at least daily. Client Applications get Communications Errors
[SQLSTATE=40003 - [IBM][CLI Driver] SQL1224N The database manager is not able to accept new requests, has terminated all requests in progress, or has terminated the specified request because of an error or a forced interrupt. SQLSTATE=55032 [Native Error=-1224]
And after a few attempts clients get
[SQLSTATE=08003 - [IBM][CLI Driver] CLI0106E Connection is closed. SQLSTATE=08003 [Native Error=-99999]
And a few attempts later they get "No Start Database Manager command was issued"
I can simply do a db2 start dbm and things start to work again, but I never know for how long.
I've tried to find some useful information in the db2diag.log. The first error is usually this:
EDUID : 111 EDUNAME: db2agent (instance)
FUNCTION: DB2 UDB, common communication, sqlccipcAgentExitList, probe:2
DATA #1 : Hexdump, 24 bytes
0x000000020457E784 : 6680 C901 0300 0000 0200 0000 FFFF FFFF f...............
0x000000020457E794 : 0200 0000 FBFF FFFF ........
2017-11-23-13.14.47.472886+060 I8475107E2236 LEVEL: Severe (OS)
PID : 1633 TID : 140263188064000 PROC : db2sysc
INSTANCE: NODE : 000 DB :
APPHDL : 0-25 APPID: *LOCAL.DB2.171123033522
AUTHID : HOSTNAME:
EDUID : 81 EDUNAME: db2fw0 (KONTO1)
FUNCTION: DB2 UDB, oper system services, sqloWaitEDUWaitPost, probe:100
MESSAGE : ZRC=0x8300002B=-2097151957
Followed by hundreds of other errors, like for example this one:
MESSAGE : Unexpected OS error. This most likely means that resources have been
torn down from underneath the prefetcher. Terminating the prefetcher
to prevent infinite looping.
CALLSTCK: (Static functions may not be resolved correctly, as they are resolved to the nearest symbol)
[0] 0x00007F91A8A67B48 _Z17sqlbpfRemoveFromQP12SQLB_pfQUEUEmPP14SQLB_pfRequestP16sqeLocalDatabasePP12SQLD_OBJ_HDLP12SQLB_GLOBALS + 0x3C8
[1] 0x00007F91A8940957 _Z26sqlbPFPrefetcherEntryPointP16sqbPrefetcherEdu + 0x197
[2] 0x00007F91A8940725 _ZN16sqbPrefetcherEdu6RunEDUEv + 0x25
[3] 0x00007F91AC0A7047 _ZN9sqzEDUObj9EDUDriverEv + 0xF7
[4] 0x00007F91AB880181 sqloEDUEntry + 0x301
[5] 0x00007F91B2AF86BA /lib/x86_64-linux-gnu/libpthread.so.0 + 0x76BA
[6] 0x00007F91A650F3DD clone + 0x6D
So what can I look for/after in order to find the cause of this problem? I've read about the Linux Kernel parameters, but I think they are okay.
Any hints?
Joachim
[SQLSTATE=40003 - [IBM][CLI Driver] SQL1224N The database manager is not able to accept new requests, has terminated all requests in progress, or has terminated the specified request because of an error or a forced interrupt. SQLSTATE=55032 [Native Error=-1224]
And after a few attempts clients get
[SQLSTATE=08003 - [IBM][CLI Driver] CLI0106E Connection is closed. SQLSTATE=08003 [Native Error=-99999]
And a few attempts later they get "No Start Database Manager command was issued"
I can simply do a db2 start dbm and things start to work again, but I never know for how long.
I've tried to find some useful information in the db2diag.log. The first error is usually this:
EDUID : 111 EDUNAME: db2agent (instance)
FUNCTION: DB2 UDB, common communication, sqlccipcAgentExitList, probe:2
DATA #1 : Hexdump, 24 bytes
0x000000020457E784 : 6680 C901 0300 0000 0200 0000 FFFF FFFF f...............
0x000000020457E794 : 0200 0000 FBFF FFFF ........
2017-11-23-13.14.47.472886+060 I8475107E2236 LEVEL: Severe (OS)
PID : 1633 TID : 140263188064000 PROC : db2sysc
INSTANCE: NODE : 000 DB :
APPHDL : 0-25 APPID: *LOCAL.DB2.171123033522
AUTHID : HOSTNAME:
EDUID : 81 EDUNAME: db2fw0 (KONTO1)
FUNCTION: DB2 UDB, oper system services, sqloWaitEDUWaitPost, probe:100
MESSAGE : ZRC=0x8300002B=-2097151957
Followed by hundreds of other errors, like for example this one:
MESSAGE : Unexpected OS error. This most likely means that resources have been
torn down from underneath the prefetcher. Terminating the prefetcher
to prevent infinite looping.
CALLSTCK: (Static functions may not be resolved correctly, as they are resolved to the nearest symbol)
[0] 0x00007F91A8A67B48 _Z17sqlbpfRemoveFromQP12SQLB_pfQUEUEmPP14SQLB_pfRequestP16sqeLocalDatabasePP12SQLD_OBJ_HDLP12SQLB_GLOBALS + 0x3C8
[1] 0x00007F91A8940957 _Z26sqlbPFPrefetcherEntryPointP16sqbPrefetcherEdu + 0x197
[2] 0x00007F91A8940725 _ZN16sqbPrefetcherEdu6RunEDUEv + 0x25
[3] 0x00007F91AC0A7047 _ZN9sqzEDUObj9EDUDriverEv + 0xF7
[4] 0x00007F91AB880181 sqloEDUEntry + 0x301
[5] 0x00007F91B2AF86BA /lib/x86_64-linux-gnu/libpthread.so.0 + 0x76BA
[6] 0x00007F91A650F3DD clone + 0x6D
So what can I look for/after in order to find the cause of this problem? I've read about the Linux Kernel parameters, but I think they are okay.
Any hints?
Joachim