DISASTER   RECOVERY   PREPAREDNESS

Mainly relating to Unix servers.

Offsite Backups

It is a good idea to have, if at all possible, a set of backups kept separate from your day-to-day backups.
Prudence suggests keeping them in a different building altogether; there are some who keep
backups off campus.  Regularity will differ between departements/units but the important point
is that if your server room and backups are toasted your clients will have lost weeks of
work instead of years.

Onsite Backups

For restoration of both client and server files some kind of daily archival system should be installed.  The
amount of data to backup will help determine the size and type of system you will need.  I am familiar with
the amanda (Advanced Maryland Automatic Network Disk Archiver) utility.  I could help you with
planning, installation, and configuration to implement amanda.

www.amanda.org

Realtime readiness

Regarding servers;  should something go wrong with either the boot drive, patching, or breach of security
generally the options are somewhat time consuming (usually measured in hours or days).  If a pactch has mucked
up your system you could back it out as long as you patched with the appropriate option and you know what
patch is the culprit.  If your boot drive has failed you can replace it and then either install from scratch or
jumpstart.  In either case you may have to bring it up-to-date using your backups.

With a small investment in hardware and time you could recover in the amount of time it takes to reboot your
machine.  By having a 2nd drive installed and mirroring that drive (at a time you choose) against the boot
drive you could simply boot to your 2nd drive in a time of crisis.  I mirror every Sunday morning at 5am and
before I conduct any patching (even though I patch using the backout option).

1) Install a 2nd drive that can at least hold the file systems you have on the master drive.
    I would recommend getting either different drives or if you want the same drive then
    make certain they are from different manufacture lots.  

2) Copy this script file to / of your master drive and configure it to meet the needs of
    your file systems.

3) Set the DEBUG to 1 and then run /mirror to test and see what would happen.  If all looks correct
    then restore DEBUG to 0 and run /mirror.  

4) At the prom level you can use the 'boot' command followed by a disk
    definition.  However, I have found that the definition you get from
    using the format command is not always the correct definition.

    Via format you can choose a disk and then use the 'current' command to
    'describe the current disk'.  You'll get something back like the following:

    format> current
    Current Disk = c0t1d0
    <IBM-DNES-318350Y-SA30 cyl 11199 alt 2 hd 10 sec 320>
    /pci@1f,0/pci@1/scsi@8/sd@1,0

    The definition, /pci@1f,0/pci@1/scsi@8/sd@1,0   may not work to boot
    at the prom level.  This particular case is off my SunFire V100.  Booting
    at the prom using this definition does NOT work.  I watched a subsequent
    normal boot carefully and noticed the word 'disk' in place of 'sd' so
    the following definition does work to boot from prom level

    boot  /pci@1f,0/pci@1/scsi@8/disk@1,0

    Taking this a step further... to reduce the stress level in a time of crisis I create
    nvalias' on each of my servers.  At the OK prompt I can simply type
    boot mirror

    How to set nvalias:    ok  nvalias mirror /pci@1f,0/pci@1/scsi@8/disk@1,0

5) Now you are ready to boot to your 2nd drive.  Time permitting you should test this
     and make sure it works.  Once booted, do a    df -k -F ufs  and note the
     target numbers to ensure you really booted to the drive you expected.


If you are interested in implementing this tool I will gladly help out.  Get in touch with me
so we can work something out.