Database load from a compressed dump runs any faster

Mark A. Parsons

2009-06-16 14:12:39 UTC

84% reduction of the dump file? very nice!

The extra 30% during the dump sounds a little high ... are you seeing 100% utilization on any of your production cpu's
during the database dump?

----------

If you've got extra/free cpu cycles on the production and test machines you may also want to look at striping the
compressed dump across multiple devices.

Main objective being to see if you can speed up the dump/load process by spreading the (de)compression across multiple
cpu's.

Obviously this may not be a good idea if one/both of your machines are already cpu bound.

Another potential downside to this scenario is that you'll have multiple sybmultbuf processes reading from your
production database devices; if you have a lot of user activity in the production database while performing the
multi-striped dump then you could see some contention on the database devices. (The amount of disk contention will
depend on the capabilities of your disk subsystem.)

Post by Audrey Won
I tested it and it worked!
We use default level which is level 1. This level of compression works very
well for us. The compressed dump file is only 16% of the original size. The
production dump time increased about 30% and the test restore time reduced
by 50%.
Thank you very much for your help, Mark.

Don't forget to check for cpu utilization on the production machine, ie,
make sure you've got the extra cpu cycles to support the sybmultbuf's
compression requirements on your production machine.
You'll also need to test which compression level you wish to use.
Most clients I've worked with use a compression level somewhere between 1
1 : decent compression (25-45%) and typically adds very little time (2-5%)
to the dump/load process
2,3 : slightly better compression (5-15% better than 1), with a bump up
(5-15%) in total dump/load times
NOTE: The compression percentage usually comes down to an issue of the
types/volumes of data ... character data compresses quite well, while
numeric/binary data doesn't compress as well.
When going above compression=3 you'll typically see a quick drop off in
benefits, ie, very little additional reduction in space usage and
ever-increasing times (50-400%) for the dump/load process.

Thank you VERY much for your detailed explanation of all the
dumping (loading) processes.
The dump file is made available to the test server through a
mounted NFS. The servers do use a 'local' backupserver to do the dump
(load).
According to your theory about using a 'local' backupserver to
load from a dump
device(s) accessed over a network (eg, NFS, SAN, etc), the sybmultbuf
process will perform the decompression *after* the network is accessed.
So I think it will reduce network load. The CPU on our test server is
pretty much under-utilized, so I think doing the decompression will not
necessarily slow down the loading process.
I will also try your suggestion of copying the compressed file
to a local directory on the test server and then load it.
Thanks again.

It's not clear (to me) from your description where the dump file(s)
reside and how you're getting the dump loaded into the test server (eg,
NFS mounted dump file, remote backupserver access, copy file and then
load with local backupserver), so fwiw ...
-------------------
First some background ...
When performing a 'local' database dump the backupserver will spawn 2
sybmultbuf processes for each dump stripe (one to read from the database
device(s), one to write to the dump device).
When performing a 'local' database load the backupserver will spawn 2
sybmultbuf processes for each dump stripe (one to read from the dump
device, one to write to the database devices(s)).
When performing a 'remote' database dump (or load) via a remote
backupserver, the local backupserver will spawn 2 sybmultbuf processes
for each dump stripe (one to read/write the database device(s), one to
communicate with the remote backupserver's sybmultbuf process). The
remote backupserver will spawn a single sybmultbuf process for each dump
stripe; said sybmultbuf process then communicates with the local
backupserver's sybmultbuf process *and* performs the reads/writes of the
dump device.
During compressed dumps/loads the sybmultbuf process that manages the
dump device will perform the actual compression. For a remote dump
(load) this means the full/uncompressed data stream is sent over the
network before (after) the remote sybmultbuf process performs the
compression (decompression) step.
NOTE #1: The sybmultbuf process performing the (de)compression tends to
be a cpu-intensive process. This can have a negative effect on other
processes running on an over-utilized machine.
NOTE #2: Compressed dumps/loads tend to take longer to perform than
uncompressed dumps/loads. The higher the compression level the longer
the dump/load typically takes to complete.
-------------------
Whether or not compressed dumps will help with your situation depends on
where the network comes into play in your dump/load scenarios.
If dumping/loading using only a 'local' backupserver then compression
could help reduce network traffic where the dump device(s) is accessed
over a network (eg, NFS, SAN, etc); in this case the sybmultbuf process
will perform the compression *before* the network is accessed.
If dumping/loading using a 'remote' backupserver then compression may
not be of any benefit since the uncompressed data has to cross the
network before (after) the remote sybmultbuf process performs its
compression (for dumps) or decompression (for loads).
If you perform a local, uncompressed dump on the production host, copy
the dump file(s) to a local directory on the test machine, and then
perform a local load on the test machine, then you may see some benefit.
Obviously (?) the copying of the dump file(s) will incur a network
overhead. Obviously (?) the network overhead could be reduced if you use
compression to reduce the size of the dump file(s).
-------------------
Depending on cpu resources (on production and test machines), network
capabilities, and your dump/load topography ... the compressed dumps may
or may not help. In a *worse* case scenario you experience the same
network overhead you currently deal with *and* you incur additional cpu
overhead for the (de)compression *and* you negatively affect other
processes running on the machine where the (de)compression occurs.
So, will compression help you? yes ... no ... maybe ... *shrug* ...

We have a production server and a test server that are located at
different facilities. Every night we dump the production database
(170G) and load the dump to the test server. Both server are ASE12.5.4
and run on IBM AIX. The data load to the test server usually runs 8.5
hours. Recently because of network saturation, the load can take more
than 18 hours on some days. I am wondering if we do a compressed dump
and load from the much smaller dump file would reduce the network
traffic volume. I couldn't figure out whether the data gets
decompressed remotely on the production server side or locally on the
test server side. I could manage to set up an environment to test this
out. But I would like to know the theory before even test it.
Any help is appreciated.