slow load tran

Mark A. Parsons

2007-03-08 05:00:52 UTC

There are a couple issues that could be causing you some problems:

-----------------

Your night time log dumps keep growing in size due to a long running transaction in the primary database:

Each time you successfully dump the transaction log the dataserver will truncate the log from the beginning of the log up to the oldest open transaction (user or replication defined).

Let's assume you have the following situation:

- a long running transaction (see master..syslogshold) is started at 23:05; assume it doesn't complete until after midnight

- you perform tran log dumps every 10 minutes on the 10's (ie, 23:00, 23:10, 23:20, 23:30, ...)

- the 23:10 log dump will complete and truncate all log activity up to the 23:05 timestamp; everything added to the transaction log after 23:05 will remain in the log; also note that new activity is being continually added to the log (ie, the log is growing in size)

- the 23:20 log dump will fire, dump the entire transaction log, but won't be able to truncate any portion of the log due to the transaction that's still open from 23:05; in the mean time the log continues to grow due to new activity; the actual log dump file may be larger than the previous log dump file

- the 23:30 log dump will fire, dump the entire transaction log, but won't be able to truncate any portion of the log due to the transaction that's still open from 23:05; the actual log dump file will be larger ... you've dumped exactly the same stuff as the 23:20 dump *plus* you've dumped all log activity that's occurred since 23:20

- each successive log dump will continue to dump all the same transactions as the previous log dump *plus* add on all new log activity since the last log dump ... this continues until that 23:05 transaction closes (commit or rollback) at which point the next tran log dump will dump the entire log (again) and then finally be able to truncate the transation log (up to the oldest open transaction)

In this scenario you are dumping (and loading) a lot of the same transactions over and over and over again. Also note that due to the open nature of the 23:05 transaction you'll find that a good bit of your load activity may be rolled back if simply because you have incomplete transactions in the dump file you're loading.

Net result is that you could eat up a lot of resources (dump time, disk space, load time, recovery time) by repeatedly processing the same set of transactions.

In this scenario you have 2 options ... continue doing things the way you do them now ... modify your tran log dump logic to only fire when you see a sizeable portion of committed transaction activity in the log (yeah, little tougher to do than it sounds; you basically have to measure the amount of space between the beginning of the log and oldest transaction pointer).

-----------------

You're obviously doing a lot more processing during the night (3.2GB dump file) than during the day (100MB dump file). ("Duh, Mark!" ?)

The performance degradation you notice for rollback's actually makes sense.

If you ever run a long transaction in your primary database you'll usually notice that a ROLLBACK takes longer than a COMMIT because the ROLLBACK actually has to undo a good bit of activity while the COMMIT just has to close out the transaction.

(For the nitpickers: yeah, yeah, yeah, there's a bit more to consider in terms of commit/rollback costs but generally speaking there is an additional cost for rolling back a transaction.)

During the recovery process (the application of transactions during the load of your log dump file) your transactions are played forward in the same order in which they were created on the primary database. COMMITs are processed relatively quickly while ROLLBACKs take a relatively longer period of time to process.

NOTE: Actual COMMIT and ROLLBACK processing times really depends on how much of the transaction still resides in memory (aka, data cache). If any part of a transaction has been flushed to disk (as is common for really long running transactions on a very active dataserver) then both the COMMIT and ROLLBACK could incur expensive physical io's to read transactional activity back into data cache.

Where the above info may be applicable in your case is two fold:

1 - there are actually a large number of ROLLBACKs in the primary database; so minimizing the number of ROLLBACKs in the primary database would translate into fewer ROLLBACKs being applied at the secondary dataserver, thus reducing recovery time during the load of transaction log files

2 - during the COMMITs and ROLLBACKs (on either the primary or secondary dataserver) do you see a lot of physical io's by the COMMIT/ROLLBACK process? if so, you may want to look at either shortening the transactions or adding more memory to the affected data cache

-----------------

In regards to your question about how to 'measure' the number of COMMITs and ROLLBACKs ...

If you're running ASE 12.5.0.3 you should consider installing and configuring the MDA tables (if you haven't already done so).

The master..monProcessActivity table contains three columns which may be of interest to you: Transactions, Commits, Rollbacks

These 3 columns represent a rolling count (aka accumulators) of the number of transactions/commits/rollbacks peformed by the associated SPID.

What you could look at doing is summing up these 3 columns and base your 'dump transaction' timing on the numbers.

NOTE: Since these are accumulator columns you'll need some way to

a) take a new 'sum' snapshot of the columns

b) save the new snapshot (#temp table? permanent table?)

c) after an interval of time take a new 'sum' snapshot

d) compare old and new snapshots to come up with a delta count of operations (and trigger your 'dump transaction' once the delta crosses a threshold); obviously (?) you'll also need to keep a running account of *when* you perform a 'dump transaction' so that you can reset your threshold conditions

e) replace the old snapshot with the new snapshot

f) go to step c

I have the attached configuration on mystandby server from
my production server files genrated during day time re
around 100m
and during night after batch runs file sizes hit 3.2 Gb with
compression level3.Sizes mentioned are in compressed
form.During day time dumps are taken for every 5 min and
during night for every 10 min.
After the end of each load tran it is showing
"Redo pass of recovery has processed 64332 committed and 729
aborted".After loading the files it is showing around 60K
committed and 700+ aborted transactions.
1)if the # of aborted transactions are high then the time it
takes to apply that file is more.(In most cases: i think so)
2)I want to take dump tran within 10 min interval if the no
of trnx are 50K-60K committed transactions and no of
aborted transactions should be very less.
How can i do this finding the no of to be aborted
transactions
Hope my question is understandable (though a bit confusing).
Bottom line is i want to improve my load tran process,bcoz
my standby is laggind by 20Hrs today,three days back it was
in sync.NO other process is going on in standby except
sysmon for every 30min. interval
------------------------------------------------------------------------
[Named Cache:db_log_cache]
cache size = 3072M
cache status = log only
cache replacement policy = DEFAULT
local cache partition number = DEFAULT
[2K I/O Buffer Pool]
pool size = DEFAULT
wash size = DEFAULT
local async prefetch limit = DEFAULT
[4K I/O Buffer Pool]
pool size = 2048.0000M
wash size = DEFAULT
local async prefetch limit = DEFAULT
[Named Cache:default data cache]
cache size = 13312M
cache status = default data cache
cache replacement policy = DEFAULT
local cache partition number = 64
[2K I/O Buffer Pool]
pool size = DEFAULT
wash size = DEFAULT
local async prefetch limit = DEFAULT
[4K I/O Buffer Pool]
pool size = 512.0000M
wash size = DEFAULT
local async prefetch limit = DEFAULT
[16K I/O Buffer Pool]
pool size = 7168.0000M
wash size = DEFAULT
local async prefetch limit = 30
[Named Cache:systems_cache]
cache size = 25M
cache status = mixed cache
cache replacement policy = DEFAULT
local cache partition number = DEFAULT
[2K I/O Buffer Pool]
pool size = DEFAULT
wash size = DEFAULT
local async prefetch limit = DEFAULT
[Named Cache:tempdb2_cache]
cache size = 2500M
cache status = mixed cache
cache replacement policy = DEFAULT
local cache partition number = DEFAULT
[2K I/O Buffer Pool]
pool size = DEFAULT
wash size = DEFAULT
local async prefetch limit = DEFAULT
[4K I/O Buffer Pool]
pool size = 800.0000M
wash size = DEFAULT
local async prefetch limit = DEFAULT
[16K I/O Buffer Pool]
pool size = 1600.0000M
wash size = DEFAULT
local async prefetch limit = DEFAULT
[Named Cache:tempdb_cache]
cache size = 4096M
cache status = mixed cache
cache replacement policy = DEFAULT
local cache partition number = DEFAULT
[2K I/O Buffer Pool]
pool size = DEFAULT
wash size = DEFAULT
local async prefetch limit = DEFAULT
[4K I/O Buffer Pool]
pool size = 800.0000M
wash size = DEFAULT
local async prefetch limit = DEFAULT
[8K I/O Buffer Pool]
pool size = 256.0000M
wash size = DEFAULT
local async prefetch limit = DEFAULT
[16K I/O Buffer Pool]
pool size = 1536.0000M
wash size = DEFAULT
local async prefetch limit = DEFAULT
[Meta-Data Caches]
number of open databases = 20
number of open objects = 6000
open object spinlock ratio = DEFAULT
number of open indexes = 4200
open index hash spinlock ratio = DEFAULT
open index spinlock ratio = DEFAULT
partition groups = DEFAULT
partition spinlock ratio = DEFAULT
[Disk I/O]
disk i/o structures = DEFAULT
number of large i/o buffers = DEFAULT
page utilization percent = DEFAULT
number of devices = 256
disable disk mirroring = DEFAULT
allow sql server async i/o = DEFAULT
[Network Communication]
default network packet size = 16384
max network packet size = 16384
[Parallel Query]
number of worker processes = 40
memory per worker process = DEFAULT
max parallel degree = DEFAULT
max scan parallel degree = DEFAULT
[Processors]
max online engines = 8
number of engines at startup = 8
statement cache size = DEFAULT
[SQL Server Administration]
procedure cache size = 1310720
default database size = DEFAULT
identity burning set factor = DEFAULT
[User Environment]
number of user connections = 4500
stack size = DEFAULT
[Lock Manager]
number of locks = 2000000
deadlock checking period = DEFAULT
lock spinlock ratio = DEFAULT