How to find out what causes regular SQLSTATE=40003 / SQL1224N

Discussion:

(too old to reply)

Joachim Tuchel

2018-03-01 12:41:08 UTC

Hi there,

we've got this Ubuntu Linux machine (16.04) runnuing DB2 Express C 11.1. We see the following error almost daily, sometimes two to three times a day:

SQL1224N The database manager is not able to accept new requests, has terminated all requests in progress, or has terminated the specified request because of an error or a forced interrupt. SQLSTATE=55032 [Native Error=-1224]

Not every connected user has this problem, others can continue their work and only realize the DB is a bit slower.

I am not asking you to tell me what the problem is, because I am quite sure it's just not that easy.

But how would I approach this problem from here? What are steps 1,2,3 etc. to get a better understanding of what is going on? Where to look, what to try, who to ask once I know more about the trouble? Where can I find documentation on how to narrow down on problems?

Things we've checked: the kernel parameters SHMM... look just like what's suggested in the manuals. The machine has lots of unused memory, the DB is far below 1 Gig in size, so it is not one of the easy things like not enaugh ram or such.

Thanks for tips!

Joachim

Troels Arvin

2018-03-01 20:47:09 UTC

Permalink

Hello,

Post by Joachim Tuchel
We see the following error almost daily, sometimes two to three times a

[...]

Has it ever worked well? If so: Is there a difference between then and
when the trouble started?

It sounds like your DB2 instance restarted. You need to dig into the
db2diag.log file to learn more on why DB2 got into trouble. In cases like
this, you need to set aside quite a bit of time to run through the
db2diag.log: It can be rather verbose when such things happen.

How about the operating system's system log (most likely in /var/log/
messages): Does it mention the OOM (out-of-memory) killer stepping in and
terminating a DB2-related process?

Have you configured DB2 to use huge pages? If so, memory handling can be
harder to get right (but supposedly, performance is also significantly
better, if your server has lots of memory). If you don't know whether you
you configured DB2 to use huge pages, then it does not (it requires
active work from the DBA).

--
Regards,
Troels Arvin

Joachim Tuchel

2018-03-02 07:55:04 UTC

Permalink

Troels (assuming that is your first name),

thanks for these pointers. I cannot actually tell if it worked better in general, because usage of the system has grown significantly over the last months since it was moved onto an Ubuntu 16.04 box.
The problem had been existing even before that move when it was still DB2 10.5 EXPC on a 14.04 Ubuntu box, but it clearly accours more often these days than a year ago. But don't think we are talking about hundreds of concurrent DB2 connections. There are hardly ever more than a maybe 20 open DB2 connections.

I will check db2diag.log. Most of the times I checked there wasn't much to find on the web. Most of the tips were about the Kernel parameters, and many discussion threads ended with something like "well, didn't actually help".

I will definitely take a look into the system logs.

The DB2 installation is pretty much untouched when it comes to configuration, so I'm quite sure there is no huge page support enabled.

Thanks for your comments and ideas. I have the feeling this is just the beginning of a journey and I'm afraid I will add a few posts to this thread in the coming weeks.

Joachim

Post by Troels Arvin
Hello,

Post by Joachim Tuchel
We see the following error almost daily, sometimes two to three times a

[...]
Has it ever worked well? If so: Is there a difference between then and
when the trouble started?
It sounds like your DB2 instance restarted. You need to dig into the
db2diag.log file to learn more on why DB2 got into trouble. In cases like
this, you need to set aside quite a bit of time to run through the
db2diag.log: It can be rather verbose when such things happen.
How about the operating system's system log (most likely in /var/log/
messages): Does it mention the OOM (out-of-memory) killer stepping in and
terminating a DB2-related process?
Have you configured DB2 to use huge pages? If so, memory handling can be
harder to get right (but supposedly, performance is also significantly
better, if your server has lots of memory). If you don't know whether you
you configured DB2 to use huge pages, then it does not (it requires
active work from the DBA).
--
Regards,
Troels Arvin

Mark Barinstein

2018-03-02 10:41:29 UTC

Permalink

Post by Joachim Tuchel
Hi there,
SQL1224N The database manager is not able to accept new requests, has terminated all requests in progress, or has terminated the specified request because of an error or a forced interrupt. SQLSTATE=55032 [Native Error=-1224]
Not every connected user has this problem, others can continue their work and only realize the DB is a bit slower.
I am not asking you to tell me what the problem is, because I am quite sure it's just not that easy.
But how would I approach this problem from here? What are steps 1,2,3 etc. to get a better understanding of what is going on? Where to look, what to try, who to ask once I know more about the trouble? Where can I find documentation on how to narrow down on problems?
Things we've checked: the kernel parameters SHMM... look just like what's suggested in the manuals. The machine has lots of unused memory, the DB is far below 1 Gig in size, so it is not one of the easy things like not enaugh ram or such.
Thanks for tips!
Joachim

Hi Joachim,

sql1224 often means, that the client connection was lost due to various reasons. This could be some network problems, inappropriate db2 client/server communication registry variables settings, manual/automatic 'force application' commands for given application handle.

Check the db2 registry variables available:
https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.1.0/com.ibm.db2.luw.admin.regvars.doc/doc/r0005660.html

And db2diag.log contents on the server for the corresponding errors like described here:
http://www-01.ibm.com/support/docview.wss?uid=swg21613530

Regards,
Mark.

Jeremy Rickard

2018-03-02 20:06:14 UTC

Permalink

Post by Mark Barinstein
sql1224 often means, that the client connection was lost due to various reasons. This could be some network problems, inappropriate db2 client/server communication registry variables settings, manual/automatic 'force application' commands for given application handle.

Also NUM_LOG_SPAN if set would kill an application connection if a transaction spans the defined number of log extents. However default is 0 (not set).

Joachim, to cut down on searching you can use commands like:

db2diag -t 2018-03-01-21.30.59 | more

...to view the db2diag.log for the appropriate time, starting just before the connection.

Probably the most important fact to establish is whether or not the instance crashed. If it's just certain connections being terminated then that suggests a network issue or connection being forced.

Jeremy