Discussion:
Tivoli Directory Integrator Performance Issue
Add Reply
Tatsiana
2017-11-09 11:05:14 UTC
Reply
Permalink
Raw Message
Hi,
I've encountered unusual problem using Tivoli Directory Integrator.
Somehow I loose time during the connector processing...

For example, I have the performance statistics printing for some random entries.
2017-10-27 08:14:47,594 >>>>>>>>>>>>>>>> Before LookUp: LookUpBy (iteration number: 1060000)
2017-10-27 08:14:47,594 INFO [AssemblyLine.AssemblyLines/someAL] - >>>>>>>>>>>>>>>> Prepared SQL statement: SELECT *** FROM *** WHERE (*** OR ***) AND *** ORDER BY *** desc FETCH FIRST ROW ONLY WITH UR
2017-10-27 08:14:48,529 INFO [AssemblyLine.AssemblyLines/someAL] - >>>>>>>>>>>>>>>> After LookUp: LookUpBy (iteration number: 1060000)
2017-10-27 08:14:48,529 INFO [AssemblyLine.AssemblyLines/someAL] - >>>>>>>>>>>>>>>> LookUp (someTable) elapsed time = 1 seconds 935 ms;

According to TS I have 0,9sec processing entry, but when statistics is printing there is almost 2sec. Seems that I loose time somewhere on Default Success hook...Do you have any idea why could I have such a difference?
Surprisingly I have such situation not for all entries.

Brief description of AL:
I have 2mio datasets that I transfer from one DB2 database to another DB2 database that consists several tables and make some data modifications between.
The first 1mio dataset is processing fine, but starting from 1.2mio or so the performance falls down in a crucial manner (1 dataset per 10mins).

We've decided to run the process by portions: initialize the counter and skip first 1mio dataset, but the performance for the second mio is still very slow, the same as for the first run...

We've printed out even SQLs to check the data. But everything is fine there.
Seems that we have some TDI memory leak or so...

Could you advice something for the problem to solve?

Thanks in advance.
yn2000
2017-11-09 17:38:59 UTC
Reply
Permalink
Raw Message
I assumed that you already tune the TDI, especially the JVM size.
Considering that you said: "..We've decided to run the process by portions.." and get the same result, I would check on the DB2 performance instead of TDI performance. For example: If the DB2 bufferpool is too low and could only absorb 1 mio of data, then DB2 would need to access the rest of data from the physical drive, rather than from the bufferpool. It makes sense, isn't it?
Most of the time the DB2 was tuned for transaction processing rather than data warehouse processing and depending on how you write the SQL command, the request might not fits with what the DB2 was tuned for.
Rgds. YN.
Tatsiana
2017-11-10 08:57:08 UTC
Reply
Permalink
Raw Message
thanks a lot for the answer YN,
I will investigate that side of process, will try to change setting for bufferpool or something else.

But what is still strange for me, it's that time calculating while TDI hook (Defalut success) code takes 1sec, I assume it's not on the side of the server or am I wrong?
yn2000
2017-11-10 16:15:46 UTC
Reply
Permalink
Raw Message
I don't think TDI takes 1 second on anything (figure of speech), because the code was loaded into the JVM, so everything runs on memory. If you see some pause, that would probably the time that TDI receive the new data for further processing. I notice that pause when processing data from TDS or DB2, which I believe due to the paging system within the DB2 engine. Well, basically, be aware that DB2 has a paging system so that the client, such as TDI, is not chocking, which is a good feature.
Rgds. YN.
Tatsiana
2017-11-13 12:23:30 UTC
Reply
Permalink
Raw Message
I've checked DB2 and it has paging system...so I can assume that the problem is somewhere here with the memory allocation (sortheap, bufferpool..) Does it make sense?
Also I've launched db2top performance monitor while process working and detected that sort time sometimes has 100% of Resource Usage.
I have 2 SQLs with Order By clause in AL: when doing first SELECT and one that is executed several times for each entry in the middle of the process.

I suppose that the first SQL for data retrieving is executed only once, so ORDER BY shouldn't be a problem, or?

In any case, as I understand we should have had such a problem with performance earlier...but not for 1,05 and greater entries...
Tatsiana
2017-11-13 14:56:56 UTC
Reply
Permalink
Raw Message
in addition to all the previous stuff I got the SQL exception "message": "[ibm][db2][jcc][10120][10898] Invalid operation: result set is closed.",while get operation.
yn2000
2017-11-13 17:55:29 UTC
Reply
Permalink
Raw Message
Well, DB2 bufferpool is one of the most important DB2 tuning, but many other tuning parameters like SORTHEAP, DBHEAP, CATALOGCACHE_SZ, UTIL_HEAP_SZ, and others are important too. In addition, running db2 runstat regularly (unless it was set to automatic) is also important. In fact, if the DB2 is in Linux environment, OS setting of ulimit could be chocking too. So, I think you might need to discuss this matter with the DB2 DBA.

In regard to: SQL exception, I need more data to comments on it.
Rgds. YN.
Tatsiana
2017-11-16 13:12:26 UTC
Reply
Permalink
Raw Message
still has the same nasty issue with performance..=(
The last attempts we made were related to modifying SQL in order to fetch the entries from the input db starting from 1mio entry till the end. We wanted to check how the process handles the situation when new entries are added to db instead of updating...
And we encountered that even from the first entry the process is extremely slow...
So the problem with performance is not likely connected with buffer pool or other db parameters...
We even stop and start db2 to clean the buffer pool, but it didn't help.

Do you have any idea what it could be?
And correct me if I'm wrong...The first feed select SQL is running once. So the result set is formed and later on fetch the entry with cursor shouldn't take long?

Thanks in advance for your help.
Eddie Hartman
2017-11-17 09:31:02 UTC
Reply
Permalink
Raw Message
Post by Tatsiana
still has the same nasty issue with performance..=(
The last attempts we made were related to modifying SQL in order to fetch the entries from the input db starting from 1mio entry till the end. We wanted to check how the process handles the situation when new entries are added to db instead of updating...
And we encountered that even from the first entry the process is extremely slow...
So the problem with performance is not likely connected with buffer pool or other db parameters...
We even stop and start db2 to clean the buffer pool, but it didn't help.
Do you have any idea what it could be?
And correct me if I'm wrong...The first feed select SQL is running once. So the result set is formed and later on fetch the entry with cursor shouldn't take long?
Thanks in advance for your help.
You are right that the select is run once (during Iterator intialization) and TDI fetches a single row at a time from the result set. However, the driver you use could be pre-fetching and buffering. Note that TDI is pure Java, so any guidance you can find searching for Java JDBC and DB2 performance will apply to TDI as well. How is performance when you perform the query outside TDI - for example, DBViz or other tool?
Tatsiana
2017-11-17 13:47:44 UTC
Reply
Permalink
Raw Message
When executing SQL outside TDI (DBViz) everything is ok.
Eddie Hartman
2017-11-19 12:33:44 UTC
Reply
Permalink
Raw Message
Post by Tatsiana
When executing SQL outside TDI (DBViz) everything is ok.
If you debug the AL and step past the selectEntries call, is that where you see the delay?
Tatsiana
2017-11-20 07:43:01 UTC
Reply
Permalink
Raw Message
really, no, I've just analyzed the process by db2top utility and mentioned that the most cost Query is this first Read Select... but fetches is not taking much time as I can see in performance statistics for entries.

In performance statistics I've seen that simple LookUp costs 1sec or so instead of normal 1-10ms.
So maybe the problem is there, but I can't say definitely that this behavior is the same for all entries, because we've analyzed only some entries from the set due to initial set is too big.

Maybe time delay is somewhere between entries processing, because the entries we've analyzed were max processed in 30-40sec, but the average processing time is 10min/entry...
Tatsiana
2017-11-20 07:44:30 UTC
Reply
Permalink
Raw Message
and one more interesting point here. It is the only one problematic instance, the others are working fine=(
Eddie Hartman
2017-11-20 18:42:43 UTC
Reply
Permalink
Raw Message
Post by Tatsiana
and one more interesting point here. It is the only one problematic instance, the others are working fine=(
Have you tried disabling prepared statements? And are there any additional commands or session/statement settings that dbViz is using to issue your select statement. And is the SQL identical in both cases...

Questions, questions. Sorry to play ping-pong like this, Tatsiana.
yn2000
2017-11-20 19:31:31 UTC
Reply
Permalink
Raw Message
It's been a while I did not play table ping-pong? So, I am in. :-)
I still think that this is DB2 performance tuning issue, not TDI performance tuning issue, especially when Tatsiana said: "...It is the only one problematic instance..." which make me wonder whether DB2 statistic were up-to-date. Here is my reasoning: DB2 has a feature called query re-write. This feature is heavily rely on the statistical data within the DB2 tables. And the symptom of issue in one instance but not the other fits, such as because of the statistical data in that instance is off, compare to the other instance. So, running DB2 runstats or update statistics (sometimes I did both) regularly is required. Having said, I would open PMR on DB2 side, because there are many data specific tuning available in DB2 world. For example: Maybe IBM PMR would find that the tables were not indexed properly. Maybe IBM PMR found that the tuning parameters are off. Maybe IBM PMR can fine tune the statistic value to provide a better outcome against the SQL command that you are sending. All of these are environment specific conditions.
Rgds. YN.

Loading...