I recently decided to test out a rollback strategy that we were thinking of using while upgrading our databases to Oracle 11g. We were using Oracle 10g at the time and we had the following data guard set-up:
- Primary DB, 10.2.0.4
- Physical standby DB 1, 10.2.0.4
- Physical standby DB 2, 10.2.0.4
NOTE: The operating system that we are running is Windows Server 2003, SP2. Although these instructions are for the Windows environment they will also work on any other environment, the only difference being some of the Windows specific configuration, such as Windows service creation.
The Plan For Upgrading Data Guard
The plan was to upgrade both data guard DBs to 11g at the same time as the primary. As part of the upgrade we needed to have a failback plan so that if the upgrade was to fail we would be able to restore the primary and standby databases back to a point in time before the upgrade took place. We didn’t want to have to rebuild the standby databases if the upgrade failed because that would be very time consuming and require a lot more work. The primary database rollback plan was relatively easy:
- Shutdown the database
- Take a “snap” of the LUNs at the SAN level
- Bring the database up and run in the upgrade
- Drop snap if upgrade successful
- Rollback to the snap if the upgrade failed
The problem that we faced with the standby databases was that we didn’t have the ability to snap the disks on the SANs that they were running on because the SAN technology was older and we didn’t have the licenses for it. We also had some questions about the configuration of data guard and how it would work:
- If we rollback the primary DB, how do we roll back the physical standby DBs if they have been applying logs during the upgrade?
- Can we stop the standby DBs applying logs, rollback the primary DB to the snap version and start shipping logs again?
- Will the physical standby DB pick up from where it left off if we remove the logs which were created while the upgrade was running?
- If the primary DB was at archivelog sequence 100 at the time of the snap and it creates 50 logs throughout the upgrade process and we snap it back will it continue from sequence 100 again?
We made educated guesses about exactly what we thought but the questions above should be able to give you some idea about the kinds of questions we were unsure of before we had completed any testing. The best way to prove anything is to actually test it out so I set out to test our theory. What was the theory? Well, we thought (I mean hoped… ) that if we cancelled the log shipping while the upgrade was running (DEFER the log archive destinations), rolled the primary DB back to the snap version of the disks, deleted all the logs which were produced while the upgrade was running and then enabled the log archive destinations again that the data guard physical standby databases would pick up from where they left off, none the wiser to the failed upgrade on the primary database.
What We are Going to Test
Here is a list of exactly what I will be walking through in the example below:
- Configure data guard on version 10.2.0.4 (although this will apply to 11g data guard, also)
- Create the backup of your primary DB in preparation to restore it to create your physical standby databases
- The restore process to set up your physical standby databases
- How to get log shipping working from the primary to the standby databases
- Configuring the rollback strategy above
- Simulation of a failed upgrade attempt
- Rolling back to the snap version of the database (at the SAN level)
- A test to see if it is all working as was the case prior to the upgrade
SAN Vs Flashback Database – Personal View…
I should point out that for this test I have worked with the storage team to get a snapshot of the disk(s) before any upgrade work is complete. Personally, I find that this is a much better solution for this kind of work than using Oracle’s flashback technology for several reasons:
- Flashback requires a large amount of space to store any changes
- I have encountered issues when using Oracle flashback technology while performing an upgrade which corrupted my guaranteed restore point. very bad!
- The SAN snapshot is very quick, clean and requires no configuration changes from an Oracle point of view – just a clean shutdown of the DB while the snapshot is taken
So, in this test I show you how to create a standby DB from your primary DB with all of the commands that I needed to use. Hopefully it is all self-explanatory for a knowledgeable Oracle DBA. If not, please ask any questions at the bottom of the page in the comments section or drop me an email and I will do my best to help.
Step-by-Step Guide to Creating and Configuring Data Guard Physical Standby
On your primary DB, shrink the size of DB as much as possible so that the backup, copy and restore to the standby database is quicker. I often use this little script to do it dynamically for a specific tablespace. It will attempt to decrease each tablespace by 500MB, but you can configure that to whatever you want it to be and do it for every tablespace by removing the “where tablespace_name =” part of the statement.
RMAN Level0 and Archive Log Mode
The first stage is to take an RMAN level 0 backup which you will use later to restore and create your physical standby database. I’m assuming that you don’t want to take the database down to complete the backup so hopefully your database is already running in ARCHIVELOG mode? You can check with this query:
If your primary database is not in ARCHIVELOG mode you can either take a cold backup, i.e. while the database is down, or put then database in ARCHIVELOG mode. To do this, configure your log_archive_dest_1 parameter and switch it on as follows:
Now check that the archived redo logs are appearing in the location specified by the log_archive_dest_1 parameter.
Force Database Logging
If you are creating a physical standby database it is a requirement that for there to be no unlogged transactions in your database, like direct path loading and transactions which do not generate REDO because your database is an identical physical copy of the primary therefore all transactions must be applied in the correct order. To enforce this you need to run the following command:
OK, so now we’re up and running in ARCHIVELOG mode and we are always logging all transactions in the REDO stream we can take a level 0 backup using the following commands:
The length of time that the backup takes will depend on the size of your database. If you are using the normal compression level for RMAN your backup will be around 20% of the size of your database.
Once the level 0 backup is complete you need to copy the files to the standby host ready for restoring to create the physical standby database.
Initialisation Parameter Configuration
Before carrying out the restore of the physical standby database we need to configure some additional parameters on the primary database for data guard to act as a primary database when in the data guard configuration. I have listed down below the parameters which I needed to change in order to configure the primary DB for Data Guard:
NOTE: I have also configured some of the parameters which are only required should the primary database become the standby database, in other words if a failover or switchover scenario were to occur.
NOTE: In the examples below TEST is the primary DB and TESTDG is the physical standby database
NOTE: The file locations for the Data Guard database will differ only by the name of the DB. For example, the path ‘D:ORADATATEST’ will become ‘D:ORADATATESTDG’.
Control File and PFILE Creation
Now that the initialisation parameters have been configured on the primary database it’s time to create a standby control file and the Data Guard database parameter file.
You should now take the initTEST.ora parameter file and modify the settings so that it can be used for the physical standby database.
OK, so we’ve configured all of the initialisation parameters for both the primary and physical standby databases, and we’ve added the appropriate parameters should we decide to fail or switch the roles of the primary and standby in the Data Guard configuration.
Windows Service, Listener, TNSNames, etc
The next step is to configure the Windows service on the server, the listeners, TNSNames files, password files and create an SPFILE for the standby database. Let’s take a look at how we are going to do that.
1. Add TNSNames entries to existing Oracle clients on primary and standby database servers
Following on using the TEST and TESTDG database names as we have done in the earlier parts of this article, here are the TNSNames entries for those DBs:
NOTE: I am using a DNS entry for the HOST parameter. If you are using DNS then ensure that the DNS name resolves correctly using the NSLOOKUP command from both servers.
NOTE: You need to make sure that SQLNet traffic is permitted in both directions so that log shipping can take place.
2. Add listener entries to the Data Guard DB server and the primary DB server
You can either create new listeners if you don’t have them installed already or just add the configuration to one you have already installed, it’s up to you. These are the entries that I have for the TEST and TESTDG DBs:
NOTE: Hopefully you will have noticed that in the configuration for the Data Guard listener the SID_NAME = TEST and not TESTDG. This is because if you remember the actual DB_NAME is the same as the primary (TEST) but the DB_UNIQUE_NAME is TESTDG.
3. Create Windows Service on Data Guard DB Server
We will be using the ORADIM utility to create the Windows service on the Data Guard DB server
NOTE: I’m specifying the full path to the oradim utility because the server is multi-homed and I want to avoid any potential confusion over which executable it uses. It’s always safest to explicitly define which executable/file/etc you want to use. Then you know exactly what you are going to get.
NOTE: You can alter the settings for this Windows service any time by opening up the registry settings (start->run->regedit HKEY_LOCAL_MACHINE->software->Oracle->your_oracle_home_name)
4. Create Password File for Data Guard DB
D:oracleproduct11.2.0dbhome_11203orapwd file=’D:ORADATATESTDGpwdTEST.ora’ password=change_on_install
5. Create an SPFILE for the Data Guard Database
Nearly there… We’re making good progress so far. We’ve done a lot of configuration, now it’s time to find out if it’s actually correct! This next stage is restoring the RMAN level0 backup which was taken from the primary database and copied over.
That’s it! It will take some time to restore the data files. The larger the DB the larger the restore time.
Now it’s time to ensure the primary DB is enabled to ship the REDO logs:
Hopefully you have configured everything correctly and it will work first time… For the majority of those who haven’t, here are some suggestions to help you out:
Possible Data Guard Configuration Issues
Ensure that you have done the following:
1. Added TNS entries on both the primary and standby DBs. If you have multiple homes, ensure you’ve changed the correct one
2. Check that entries have been added for the listeners on both DB servers
3. Check a SQL Plus log on from primary DB to standby and vice versa
4. Copy the password file from the primary and use it for the standby DB
5. Check network connectivity to the standby host on the correct port (1521 by default)
Now we should have the Data Guard database configured and be happy that the logs are shipping from the primary to the standby database – check this first by switching logs on the primary a few times. You can check that the logs are applying to the physical standby using this query:
The next stage of the experiment is to take snaps, simulate a failed upgrade and then test the rollback. That’s what I will walk you through now.
Testing Rollback Using Snaps
1. Switch logfile on primary
ALTER SYSTEM SWITCH LOGFILE;
2. Ensure applied to standby
8. Shutdown the DB (fake rollback required)
9. Revert back to snapped drive
Ask storage engineers to revert to snap drive
*** Remove any archivelogs created in the meantime ***
If you don’t do this there may be some kind of conflict when the database attempts to create other archivelogs with the same names. Also, you don’t want the archived redo logs shipping to your standby database so move/delete them to be on the safe-side.
10. Bring the database up
11. ENABLE archivelog 2 dest
12. Re-enable archivelog shipping by running this command on the Data Guard database
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION;
13. Check archivelogs are shipping and applying to standby DB
Now that you have completed all of these steps you should hopefully see that the archived redo logs are shipping once more to your data guard physical standby database. And, most importantly, this test has confirmed that my theory was correct and you can use snaps to rollback a failed upgrade on the primary database and allow the physical standby databases to continue recovering from where they left off before the upgrade.
I should point out here that it’s also possible if you wanted to run a long running upgrade into your primary database and not stop the log shipping that you could take snaps of the disks on the physical standby database at the same time. Doing it this way would mean that you have to restore the Data Guard database if the upgrade should fail but if it was successful it would already be up-to-date with the primary so no catching up of the REDO logs would be required.
Which option you decide to take is up to you and will depend on your specific requirements and what you deem is acceptable and necessary.