This site requires JavaScript to be enabled
787 views

8.0 - Updated on 2023-03-22 by Joan Huang

7.0 - Updated on 2022-03-15 by John Grover

6.0 - Updated on 2021-03-29 by Joan Huang

5.0 - Updated on 2019-10-29 by Chelsea Noble

Overview

This document describes the hosting strategy adopted by the Databases Services team for hosting Microsoft SQL Server databases in the Amazon cloud (AWS). The document describes the factors that affect the tenancy of a particular database as well as the guidelines for deploying, migrating, monitoring, recovering and backing up Microsoft SQL server databases in AWS.

 

Database Tenancy

General Considerations

The following are the major factors to take into consideration when migrating a SQL Server database to AWS or implementing new SQL Server database in AWS:

On a case-by-case basis an application’s database will be evaluated against the above criteria to determine the tenancy of the database.

To maximize our investment in the AWS infrastructure, the default implementation will be a multi-tenant SQL Server Standard Edition r6i EC2 instance. Standard Edition offers the features most often required by our application population, including basic implementations of the HA tools. Should licensing cost become an issue Standard Edition offers a significant savings over the Enterprise, BI, or Datacenter editions. The r6i EC2 type offers the balance of compute power (CPU) and memory appropriate for a well-performing SQL Server workload at the scale we anticipate implementing on an individual server.

Exceptions to the default implementation will be considered as described below.

Database Feature Requirements

The current preferred SQL Server install is SQL Server 2019 Standard Edition on an R6i AWS EC2 instance. Applications that require Enterprise edition features or a different version will be considered and discussed with clients.

Database Size

Applications with large databases (>100 GiB) may be a candidate for a separate SQL Server on its own EC2 instance,  depending on performance requirements of the application or  any special backup/recovery requirements.

Data Sensitivity

Sensitive data (to be determined by the customer following the published classifications of data by InfoSec) will not be mixed on the same server with non-sensitive data.

Availability Requirements

If a customer has a demonstrated need for a highly available database a HA instance will be made available. The bar for SQL Server HA in AWS will be very high given the low recovery time we have demonstrated in a single-instance recovery scenario (as low as 21 minutes to recover a server).

Compute Requirements

When an application’s compute requirements will affect other database’s performance on a shared instance, it will either need to be migrated to a separate instance or the existing server will be upgraded. The current AWS cost model generally makes a new instance more cost effective.

Client/Third Party Needs

Certain third party applications are written so that they require elevated privileges to run, employ sub-standard security, or are otherwise “badly behaved”. These applications will be candidates for migration to a dedicated server or to a designated less secure server.

 

Database Deployment

Using Amazon’s Elastic Compute Cloud Service

Deploying SQL Server on an Elastic Compute Cloud (EC2) instance offers the greatest flexibility. Because EC2 gives full control of the instance, installation all of the features of SQL Server is possible.

 

Advantages and Limitations of EC2

Using EC2 leverages the infrastructure and services built by the OIT in AWS. Server level backups, Active Directory and other shared services are available. Full access to the underlying operating system allows for custom processing such as data imports and full access to the SQL Server management features allows for better performance tuning. We’re able to use some of the more advanced SQL Server features such as sending emails, filestream and linked server, etc.  The cons of SQL server on EC2 is that we have to manage both host and sql server; we upgrade and apply patch server, take full backup of volumes, etc.

 

Using Amazon’s Relational Database Service

Some of the limitations of Amazon’s SQL Server RDS offering make it unsuitable as a general purpose platform for hosting MS SQL Servers in our environment. The most notable limitation is the lack of Active Directory integration. Since we will need to implement for ourselves many of the database management processes offered by RDS, there is no advantage to incurring the additional cost for an AWS managed system. For OIT-managed SQL Server databases we will provide Notre Dame standard management of the instance.

Advantages and Limitations of RDS

For non-OIT departments that want to implement an RDS SQL Server on their own we can offer the following analysis of the service and link to Amazon’s published documentation.

 “SQL Server is a relational database management system developed by Microsoft. Amazon RDS for SQL Server makes it easy to set up, operate, and scale SQL Server deployments in the cloud. With Amazon RDS, you can deploy multiple editions of SQL Server (2014, 2016, 2017 and 2019), including Express, Web, Standard, and Enterprise, in minutes with cost-efficient and re-sizable compute capacity. Amazon RDS frees you up to focus on application development by managing time-consuming database administration tasks, including provisioning, backups, software patching, monitoring, and hardware scaling.”

https://aws.amazon.com/rds/sqlserver/ (10/2022)

There are some limitations that should be considered before committing to the RDS platform. It is a very good basic database platform, but disables most advanced management features, reserving them for itself.

  1. RDS was not Active Directory aware; you would not be able to use NetIDs or EDS groups for authentication. In a recent year, the RDS instance works with AWS Directory Service for Microsoft Active Directory, also called AWS Managed Microsoft AD, to enable Windows Authentication.  We need to wait for ND to buy and implement AWS Directory Service; then we explore the features.
  2. We cannot control RDS backup retention policy.
  3. Terminating an RDS instance will, by default, delete all the backups. You also need to explicitly request a backup when stopping your instance to avoid data loss.
  4. You have no access to the underlying operating system resources, if you have jobs that rely on external data files (imports/exports), PowerShell or batch scripts they cannot be used on an RDS instance. This also restricts access to many instance administrative functions.
  5. Restore/refresh/clone a specific database on RDS instance is complicate and time consuming.
  6. RDS does not support some of Microsoft SQL Server features such as FileStream, Maintenance Plans, Replication, Linked Server (Oracle), etc.

Items 1 and 4 may eliminate RDS from consideration for many applications that rely on AD or external resources.

Item 3 means that you need to be very careful when restoring from backup so that you do not lose backup history. The recovery method is to create a NEW instance with its own 35-day backup history, removing the original instance removes all its associated backups which means you need to keep it until the new instance catches up.

Taking into account the advantages and limitations of Amazon’s RDS will help you determine if it is a viable solution for your non-OIT supported database needs.

SQL Server deployment on EC2

Server Commissioning

R6i Instances are memory optimized; they are designed to deliver fast performance for workloads that process large data sets in memory. In general, we choose R6i type EC2 instances to host sql server.

 

An Amazon Elastic Block Store (EBS) volume is a durable, block-level storage device that can be attached to an EC2 instance.  General Purpose SSD (gp3) volumes offer a leading balance of price and performance for SQL Server workloads when configured appropriately.  In general, we choose EBS gp3 volumes for our sql server hosts.

 

In ServiceNow, create a request to commission an EC2 R6i.xx instance as a sql server host. Follow the form to fill all fields.  Our platform group has a Cloud Formation script specifically for SQL Server machines.  By default, an IAM role assume-database-ec2-roles should be attached to the newly built EC2 instance; and ADND/OE DBAS is added to Administrator’s group on the host.

 

 

 

For each EC2 instance, we request five EBS gp3 volumes attached as local drives. The names and usages of local drives are listed in below:

C: -- store binaries

D: -- store MSSQLxx.MSSQLSERVER directory, which was generated during sql server installation, user databases’ data files and our scripts

E: -- store user databases’ transaction log files

F: -- store database backup files

Z: -- store temp database data files and transaction log files

 

 

SQL Server setup is partially done by PowerShell and TSQL scripts. Here is the list of steps:

  1. Ensure there are 5 drives: C:, D:, E:, F:, and Z:;
  2. Run SQLSetupPreStep.ps1.   This script creates folder C:\sqlserver_system_database_file_backup, D:\Scripts, D:\MSSQL\data, E:\MSSQL\tlog, F:\MSSQL\backup, F:\MSSQL\sqlAgentJob , Z:\MSSQL\data, Z:\MSSQL\tlog, etc. This script also copies home-grown database maintenance scripts from S3 bucket to local D:\Scripts
  3. Install sql server and patch
  4. Bounce the host, which will set some environment variables including the path to SQLPS.
  5. Create empty DBATools database by restoring DBATools database from D:\Scripts\DBATools.bak; open MaintenanceSolution_ND.sql in SSMS; search for “ND Env”; if there is need, change the code in the block of “ND Env beginning” and “ND Env ending”; execute it to create procedures and tables in DBATools database.
  6. Modify D:\Scripts\ SQLServerMoveBackup.ps1 and SQLServerArchiveSystemDBFiles.ps1:

Add the new sql server name into hash table variable “$hash_env”.

Modify SQLServerArchiveSystemDBFiles.ps1 $pathToSysDBFiles

  1. Log into the new sql server to executes "SQL Server Alert and Email Setup.sql" (set up database alerts and email) and “CreateMain8Jobs.sql" (set up five database backup jobs, one job to Copy and Move backups, one job to do index defragmentation and one job to check data integrity). Adjust 8 jobs’ schedules.
  2. Test database server connection from desk top; if the new sql server is not accessible from desk top, verify if local inbound SQL_TCP and SQL_UDP FW rules exist on the new EC2 instance.
  3. Set up weekly job to Clean Jobs History.
  4. Stop sql server; run D:\Scripts\SQLServerArchiveSystemDBFiles.ps1 to save system database files on C: drive and in S3
  5. Add the sql server to Idera SQL Diagnostic Manager (if it is a prod server) on SQLIDERA1-PROD.
  6. Register it for Microsoft SQL Server Central Management and Master/Target SQL Agent jobs on SQLMON5-PROD.

SQL Server deployment on RDS

To deploy a SQL Server RDS instance follow the steps outlined in the migration section below. Also make sure that the Core Services team is aware of the new instance so that is included in the master server list and a Reserved Instance (RI) purchased, if needed.

High Availability

SQL Server High Availability is accomplished by using Availability Groups. SQL Server AG’s require two servers, preferably in different AWS Availability Zones. Request a Windows Server Failover Cluster (WSFC) be commissioned. There is a Cloud Formation script that will build it. When the WSFC is available SQL Server Enterprise Edition needs to be installed on each node. Once the database is built the AG wizard will guide you through creating and populating the AG.

SQL Server Availability Groups are a high availability option, but the relatively low RTO achievable in AWS without HA should preclude its use in most cases.

 

Migration Methodology

How to migrate a SQL Server database to EC2

The migration of SQL Server databases to AWS is relatively straightforward. The database(s) for the application will be backed up, the backups transferred to AWS and recovered to the appropriate server as determined by the above criteria. Any server-level objects will have to be identified, scripted, and recreated on the target server. These may include items like logins, SQL Server Agent jobs, Linked servers or SSIS packages.

  1. Determine the target SQL Server instance, or commission a new one per the criteria outlined in the Database Tenancy and Database Deployment sections.
  2. Backup the source database(s) using a native SQL Server backup, compression is advised to reduce transfer time to the cloud.
  3. Script all needed non-database objects such as logins and agent jobs. In most cases this can be accomplished by right-clicking the database object and selecting “script” from the tasks menu. For security objects use the Idera Permissions Extractor tool.
  4. Restore the source database backup(s) to the target server.
  5. Run the scripts created in step 3 to create the non-database objects.
  6. Review and test.

How to migrate a SQL server database to RDS.

  1. Build an RDS, or decide which one to target.
  2. Backup source database using SQL Server native backup, compression is advised to reduce transfer time to the cloud.
  3. Script all needed non-database objects such as Logins. Windows Domain authentication works only with AWS Directory Service for Microsoft Active Directory (also called AWS Managed Microsoft AD).  Currently, Notre Dame has not implemented AWS Managed Microsoft AD. Therefore, RDS cannot use windows authentication. All windows logins need to be changed to SQL Server logins. Use the Idera Permissions Extractor tool, some small alteration of the created script may be needed.
  4. Connect to the RDS instance through SSMS; Restore the source database backup.
  5. Run the scripts created in step 3 to migrate login accounts
  6. Review and test.

 

Database Monitoring

Idera (Diagnostic Manager/performance monitor, Prod only)

The Idera Diagnostic Manager will be used to monitor and alert on all production SQL Server instances.

UCP (performance history, non-production)

A Utility Control Point (a MSSQL built-in feature) will be used to monitor non-production SQL Server instances and production instances not being monitored by Idera

 

 

Database Backup and Recovery

Database backups

Native SQL Server backup tools will be used. All backup files are generated and stored on an attached EBS volume; then copied to S3 bucket in a different AWS account within 2 hours.

 

Retention on the local machine is controlled by the PowerShell-driven backup script, retention in S3 will be driven by S3 lifecycle policies.

 

Database backup Encryption

Before SQL Server 2014, Transparent Data Encryption (TDE) was commonly used to encrypt data at rest, however, it will significantly reduce query performance and increase CPU consumption. Luckily since SQL Server 2014, the new Backup Encryption option allows us to encrypt the backup.

 

  1. Prepare a sql server so we can take a database backup with encryption

To encrypt the SQL Server backup while creating, we need a SMK, a DMK and a certificate, and back up them first. 



      1. Check the Existence of DMK

SELECT * FROM master.sys.symmetric_keys



      1. Create A DMK

-- Create the Database Master Key (assume the database master key does not exit)

USE master
GO
CREATE MASTER KEY ENCRYPTION BY PASSWORD='password'

GO



      1. Create A Certificate

-- Create Backup Certificate;

--Note: Without parameter EXPIRY_DATE, SQL Server creates a certificate with an expiration date one year in the future by default.


USE master
GO
CREATE CERTIFICATE Certificate
WITH SUBJECT = 'SQL Backup Certificate';
GO



      1. Backup the key and certificate

-- Backup the Service Master Key
USE master
GO
BACKUP SERVICE MASTER KEY
TO FILE = 'filepath\SMKfilename.key'
ENCRYPTION BY PASSWORD = 'SMKpassword';
GO

-- Backup the Database Master Key
BACKUP MASTER KEY
TO FILE = 'filepath \DMKfilename.key'
ENCRYPTION BY PASSWORD = 'DMKpassword';
GO

-- Backup the Certificate
BACKUP CERTIFICATE Certificate
TO FILE = 'filepath \CertificateFilename.cer'
WITH PRIVATE KEY(
FILE = 'filepath \CertificateKeyFilename.key',
ENCRYPTION BY PASSWORD = 'CertificatePassword');
GO

 

 

  1. Backup database with encryption

BACKUP DATABASE [DBATools]

TO  DISK = N'S:\MSSQL\backup\DBATools_encripted.bak'

WITH 

   COPY_ONLY,

   COMPRESSION,

   ENCRYPTION(ALGORITHM = AES_256, SERVER CERTIFICATE = [BackupCert_ndpddb1p]) 

GO

 

Note: in the backup job, the backup step is defined as

               EXECUTE dbo.DatabaseBackup

@Databases = 'USER_DATABASES',

@BackupType = 'FULL',

@Encrypt  = 'Y',

@ServerCertificate = 'BackupCert_ndpddb1p',

@EncryptionAlgorithm='AES_256',

@Verify = 'Y',

@Compress = 'Y',

@CheckSum = 'Y'

 

  1. Check if a sql server database backup is encrypted

RESTORE HEADERONLY

 FROM DISK = N'filepath\Filename.bak';

GO

 

Run the above statement in SSMS, and check the values in EncryptorThumbprint and EncryptorType columns. If it is not encrypted, they will be NULL.

 

 

  1. Restore a database with encrypted backup on the same sql server where the backup was taken

--just like restore a database with un-encrypted backup.

 

  1. Restore a database with encrypted backup (taken on server A) on a different sql server (server B)


      1. On server B, Create Master Key

CREATE MASTER KEY

ENCRYPTION BY PASSWORD = 'MasterKey_Password';



      1. copy certificate file CertificateFilename.cer and its private key file CertificateKeyFilename.key from their backup location on Server A to a location local to Server B. SSMS connection to Server B and create certificate from the backups

CREATE CERTIFICATE BackupEncryptTestCert

FROM FILE = 'filepath\CertificateFilename.cer'

 WITH PRIVATE KEY

      (

        FILE = 'filepath\CertificateKeyFilename.key',

        DECRYPTION BY PASSWORD = 'CertificatePassword’

      )

 



      1. Restore the database on Server B

 

 

  1. Certificate expired

When the certificate expires, the backup job would fail with error:

“The Certificate specified for backup encryption has expired. [SQLSTATE 42000].

BACKUP DATABASE is terminating abnormally. [SQLSTATE 42000]”

 

How to fix:

Step 1: create a new certificate with new expire date

--create a backup certificate which will expire on Jan 31,2033

USE master

 

CREATE CERTIFICATE BackupCert_ndpddb1p_2023 WITH SUBJECT = 'ndpd-db1-prod Backup Encryption Certificate',

EXPIRY_DATE = '20330131';

Go

 

--backup CERTIFICATE BackupCert_ndpddb1p_2023.cer and its private key

 

BACKUP CERTIFICATE BackupCert_ndpddb1p_2023

TO FILE = 'C:\sqlserver_system_database_file_backup\BackupCert_ndpddb1p_2023.cer'

WITH PRIVATE KEY

(

    FILE = 'C:\sqlserver_system_database_file_backup\BACKUP_CERTIFICATE_PRIVATE_KEY_ndpddb1p_2023.key'

   ,ENCRYPTION BY PASSWORD = 'xxx'

);

               Step 2: modify the backup jobs to use the new cert.

 

 

Database Backups Archive (Long Term Storage) Layout on S3

SQL Server backup copies are stored on a local disk for only 8 days; as soon as a backup is produced, it is copied from local disk to S3 for long term storage. On S3, We create lifecycle policies to implement data retention. Since a S3 lifecycle policy can be applied to a folder and affect all its subfolders, we structure S3 storage space as below:

<bucketName>/<environment>/<frequency>/<sqlServerName>/<databaseName>/<backupFileName>

 

bucketName is “sqlserverbackups.drdatand”.

environment takes two values: “np” stands for non-production; “prod” stands for production.

frequency takes three values: “daily”, “weekly” and “monthly”. S3 lifecycle policies are applied at this level of folders.

 

Policy Name

Rule Target

Comment

prod/daily 15days

prod/daily/

Production daily differential backups and transaction backups will be kept for 15 days

prod/weekly 90 days

prod/weekly/

Production weekly full backups will be kept for 90 days

prod/monthly 18 months

prod/monthly/

Production monthly copy-only full backups will be kept for 18 months

np/daily 8 days

np/daily/

Non-production daily differential backups and transaction backups will be kept for 8 days

np/weekly 90 days

np/weekly/

Non-production weekly full backups will be kept for 90 days

np/monthly 6 months

np/monthly/

Non-production monthly copy-only full backups will be kept for 6 months

 

 sqlServerName takes names such as “sqlmon5-prod”, “sqlstd1-prod”, …, etc.

databaseNames takes names of all databases on the individual SQL server.

backupFileNmae takes names with pattern: <databaseName>_<dateAndTime>.<fileExtention> such as “dbautility_20200328070011.bak

 

Server Backups

Currently Cloud Platform Services uses Cloud Protection Manager to snapshot EC2 instances on a daily basis. Through the configuration, the snapshot can include all mounted volumes (disks) or only include root volume (ie, C: drive). The snapshot backups will be kept for 30 days for all environments (dev, test and prod) within local region; the snapshot backups for production instances will also be stored cross regionally with a separate AWS account (backup account).

 

Recovery Scenarios

There are five EBS volumes mounted on an EC2 instance for SQL Server.

  1. Root Volume (C: drive): where sql server binaries are stored.
  2. Data Volume (D: drive): where database data files and home-grown scripts are stored.
  3. Tlog Volume (E: drive): where database transaction log files are stored.
  4. Backup Volume (F: drive): where database backups are stored. This volume is usually referred as local backup storage.
  5. TempDB storage (Z: drive): where TempDB datafiles and transaction log files are stored. This volume is recreated each time the instance is booted.

 

Seven likely failure and recovery scenarios are outlined below:

 

  1. An individual database crashes.

The database will be recovered from its backup stored on F: drive. The recovery time is close to what it takes on an On-Premise sql server. Depending on the size of the database, it can be few minutes to an hour.

 

  1. Data volume, and/or Tlog volume crashes.

First, a new EBS volume will be created and mounted to the EC2 instance. This will take less than 10 minutes. Additional 10 plus minutes is needed to create folder layout and bring up sql server. Then each of databases will be recovered from its backup (similar to scenario 1). The total down time can be 25 minutes plus, which depends on the number of databases on the sql server and the size of each database.

 

  1. Backup Volume crashes.

A new EBS volume will be created and mounted to the EC2 instance. Then re-generate folder layout and perform a full database backup on all databases. Copy backup files from S3 to the backup volume if there is need. The total dba working time will be less than 20 minute and sql server down time is zero.

 

  1. EC2 instance crashes but the root volume (C: drive) is intact.

In the event of a hardware crash where no EBS volumes are affected the instance will need to be rebooted. Total downtime will be less than 10 minutes.

 

 

 

 

 

  1. Both EC2 instance and root volume (C: drive) crash.

Recovery option 1: The snapshots of the EC2 instance and root volume, which were taken by Cloud Platform Services, will be used to re-instantiate a new EC2 instance and the EBS volumes re-mounted to the new instance. The total down time is unknown at this point.[1] 

 

Recovery option 2: provision a new EC2 from Cloud Formation script. Mount EBS volumes (D:, E:, F:). Download sql server installation files from S3; Install sql server along with upgrades/patches. Bring up the sql server.  The total down time will be less than 30 minutes.

 

  1. A table was accidentally deleted.

Restore the database from its backups taken prior to the time when the table was deleted; give the restored database a different name. Import the table into its original database with over-write option. The total dba working time will be about 10 minutes plus, which depends on the size of the database and the number of backups needed. The total sql server down time is zero.

 

If the table was deleted 8 days ago, the specific backup files (a full backup, few differential and transaction log backups) will be downloaded from S3 to the EBS volume. Then we perform the steps stated above.

 

  1. Loss of EC2 instance and all EBS volumes.

In a disaster event, we could lose EC2 instance and storage. We’ll recover the EC2 instance and root volume in the way described in scenario 5. Then create other three EBS volumes and copy the backups from S3 to EBS volumes. Bring up sql server and restore each of databases from its backups.


We need to work with storage to practice this scenario to get the timing. SWAG = an hour to copy snap back from BACKUPND, prep EBS vols and reboot