/var/log/metasplo.it: February 2014

Monday 24 February 2014

DPM 2012 R2 only allows offline backups of VM with updated integration tools

All DPM admins that backup Hyper-V VMs have seen the dreaded offline backup state before. Admins of earlier versions of DPM might know this as "Saved State", either way its annoying and disruptive to backup VMs in an offline/saved state.

Recently when we did a migration of VMs from one Hyper-V cluster to another with SCVMM, we came across an issue where most of the VMs appeared as "Offline backup" in the DPM console.

This can be caused by a multitude of issues, including not having a SCSI adapter connected to the VM and running an old version of the integration tools.

However after checking all the requirements (listed below), we were still only able to capture offline backups on some VMs.

The Backup (Volume Snapshot) Integration Service is disabled or not installed.
A virtual machine has one or more dynamic disks.
A virtual machine has one or more volumes that are based on non-NTFS file systems.
In a cluster configuration, the virtual machine Cluster Resource Group is offline.
A virtual machine is not in a running state.
A Shadow Storage assignment of a volume inside the virtual machine is explicitly set to a different volume other than itself.

In testing we removed a VHD from a VM that had multiple VHD connected and then tested it, this worked! Funnily enough, when we re-added the VHD, the VM still could be backed up online. This led us to a fix that worked on most of our problematic VMs.

The Resolution

1. Open up Hyper-V or SCVMM.

2. Go into the hard drive settings of the problematic VM and remove all VHD (not delete, just remove for now - do I even need to say that?).

3. Save the VM.

4. Re-add the VHD and save the VM again.

5. Start the VM

6. Attempt to backup with DPM again, you may need to clear the state cache in DPM

Essentially we are doing nothing, removing and re-adding a configuration. However this seems to resolve the issue on most VM. This may or may not be similar in your environment, as these VM had all been migrated between Hyper-V servers with VMM.

Wednesday 5 February 2014

Lync 2013 CMS error: Instance points to a CMS store that is being moved to a different pool

We were doing some routine checks of a newly provisioned Lync 2013 front end fail-over pool when we discovered a CMS error on the backup pool in the event logs. Our deployment is configured with two standard edition front end pools with resiliency.

The error is as follows:

Event ID 4082
LS Backup Service

Microsoft Lync Server 2013, Backup Service central management backup module failed to complete import operation.
Configurations:
Backup Module Identity:CentralMgmt.CMSMaster
Working Directory path:\\lync002.contoso.info\lyncstore\1-BackupService-20\BackupStore\Temp
Local File Store Unc path:\\lync002.contoso.info\lyncstore\1-BackupService-20\BackupStore
Remote File Store Unc path:\\lync002.contoso.info\lyncstore\1-BackupService-21\BackupStore
Additional Message:
Exception: Microsoft.Rtc.BackupService.ModuleUnavailableException: Backup module is temporarily unavailable at this point. Reason: Instance (local)\rtc points to a CMS store that is being moved to a different pool.
at Microsoft.Rtc.BackupService.BackupModules.CentralMgmtBackupModule.CheckModuleAvailability(Nullable`1 primaryPool)
at Microsoft.Rtc.BackupService.BackupModules.CentralMgmtBackupModule.GetBackupCookie()
at Microsoft.Rtc.BackupService.BackupModuleHandler.ReceiveBackupDataTask.GetBackupCookie(Boolean& isModuleInitialized)
at Microsoft.Rtc.BackupService.BackupModuleHandler.ReceiveBackupDataTask.InternalExecute()
at Microsoft.Rtc.Common.TaskManager`1.ExecuteTask(Object state)
Cause: Either network or permission issues. Please look through the exception details for more information.
Resolution:
Resolution

This error message was repeated every 15 minutes as the backup jobs were run. A separate error was created for the import and export process.

During our testing phase we did some fail-over testing which included moving the CMS database to the fail-over node. We believe this may have caused this error to occur. While it was not causing any known issues in the environment, it may come to a head should we ever need to fail over to our resiliency node.

The Resolution

The resolution was fairly simple but did require a few attempts before we got the process order correct.

1. Open the Lync Topology Builder and remove the associated backup pool from the resiliency tab of the primary front end pool.

2. Publish the topology

3. Use the Get-CsManagementStoreReplicationStatus powershell applet to verify the topology has replicated to the associated backup pool.

4. Open a command prompt to "%ProgramFiles%\Microsoft Lync Server 2013\Deployment" and run the bootstrapper.exe on the backup pool. This will remove the management and backup service from the backup pool.

5. Remove the CMS database from the backup pool using the following powershell command.
Uninstall-CsDatabase -CentralManagementDatabase -SqlServerFqdn lync002.contoso.info

6. Go back to the Topology Builder and re-add the associated backup pool to the primary front end pool.

7. Publish the topology

8. Wait for the topology to replicate

9. Re-run the bootstrapper on the backup pool node

This 5 minute process fixed the error messages for us and hopefully saves us in the unfortunate even of a fail-over.