Thursday 24 April 2014

DPM 2012 R2 failing to backup SQL 2012 SP1 Always On Cluster VM

After we created a new Server 2012 R2 cluster and migrated our VMs across, the next job was to get backup in place with DPM 2012 R2.

Most of the machines backed up with Online backup without issue and our SQL cluster was no exception, the online backup initiated without fault. However after some time one of the Always On cluster came back with the following error.

The VSS application writer or the VSS provider is in a bad state. Either it was already in a bad state or it entered a bad state during the current operation. (ID 30111 Details: VssError:The writer experienced a non-transient error.  If the backup process is retried,
the error is likely to reoccur.
 (0x800423F4))

Digging further on the SQL machine with the "vssadmin list writers" command, we found that the SQL VSS Writer service was in an error state.



The Solution

The solution is very simple but introduces some other problems.

1. Jump on the erroneous SQL server.
2. Open a command prompt and type services.msc
3. Go the SQL VSS Writer service and stop the service.
4. Re-run the DPM job.

This successfully completes the VM snapshot backup, but it introduces an issue with SQL backup if you are also using DPM for SQL backup on the same Always On cluster.

In a SQL Always On cluster, the DPM backups are by normally taken from the secondary node. The primary node is left alone to do SQL, while the inactive secondary takes the heavy backup load.

This is why we only had the problem on our secondary node, it is also why it introduces an SQL backup problem. When the SQL VSS Writer service is stopped, DPM can't perform SQL backups on the server.

Two possible options are:

1. Don't perform VM snapshot on your backup node, it may be overkill anyway. In the event of a failure you can spin up a new VM and make it the new secondary.

2. Run a schedule that disables the SQL VSS Writer service some time before the VM snapshot and re-enable it again after the snapshot.

We are using option 2 and so far it is working well. You need to ensure that your SQL backup isn't occurring during your SQL VSS writer disabled period, but your a good admin and wouldn't have your VM snapshot and SQL backup scheduled at the same time anyway!

Monday 7 April 2014

Lync 2013 response group alternatives - bypassing the annoying beep

Lync 2013 is a great product, especially for education with Microsoft agreements in place. It can offer a low cost PABX replacement that integrates seamlessly with other Microsoft products.

Arguably the biggest downfall of Lync throughout multiple releases has been the response groups and how they handle incoming calls. Features such as business hours and the queues work well, but the annoying response group connection beep is enough to push any sane receptionist over the edge.

The other big downfall of response groups is survivability. When deploying a resiliency group, a manual fail-over of the CMS is required for response groups to work. Normally response groups are used on critical extensions, such as reception, lines that you can ill afford to have down for any length of time.

For anyone that is unfamiliar, when a user picks up a call that is routed through a response group, after picking up the handset an audible beep is played until the transfer from the response group to the user picking up the call is complete. Depending on the endpoint (phone, Lync client, etc) this beep can take longer and sound different, but one thing is consistent, it drives receptionist crazy.

There are a number of proposed workarounds to response groups, including using call delegation, but all of them remove the automated "out of hours" component of the response group. The ability for the phone to automatically transfer to voice-mail between specific hours and ring during work hours.



The Solution


Part 1, Calling queues

We like the idea of simultaneous ring to delegates. One of the biggest benefits of response groups is having an alternate call group ring after a pre-determined amount of time. In response groups this is an alternate queue with a delay. This allows a receptionist on a call, or not available, to be covered by other staff, without bombarding these staff with every call.

To do this we create a standard Lync login with domain account, and use this as the "control" account.

We sign the reception in with the control account and then set-up a simultaneous ring to delegates with a delay. This is a simple and elegant solution for simple call groups. However if you have a complex call group with multiple queues and delays this won't work.

This is nothing new, lots of bloggers have suggested this as a genuine alternative to a response group, but it is lacking the out of hours functionality that a response group has.


Part 2, Managing out of hours

This is where the white board came into play and we came up with a plausible idea.

Windows task scheduler, simple right? Task scheduler can run a batch file or command at any predefined time or on the triggering of an event.

We are keeping it simple and triggering the following events.

Monday-Friday 8AM: Simultaneously ring delegates with delay
Monday-Friday 4PM: Forward calls directly to voice-mail

This ensures that calls are forwarded as required during business hours and outside of business hours the call is forwarded straight to voice-mail.

To run these tasks you will need to configure sefautil as a Lync trusted application, this is outside the scope of this blog but is available on many other blogs.

In testing we found the order in which sefautil tasks are executed is critical to success, below is an example of the tasks we are running. Note that before you run these commands you need to set-up your list of delegates.


8AM script, disable voice-mail fwd and enable call delegates fwd with 15 second delay
sefautil.exe /server:SERVERNAMEHERE receptionphone@contoso.info /disablefwdimmediate
sefautil.exe /server:SERVERNAMEHERE receptionphone@contoso.info /enablesimulring
sefautil.exe /server:SERVERNAMEHERE receptionphone@contoso.info /simulringdelegates /delayringdelegates:15


4PM script, disable call forward, enable voice-mail fwd
sefautil.exe /server:SERVERNAMEHERE receptionphone@contoso.info /disabledelegation
sefautil.exe /server:SERVERNAMEHERE receptionphone@contoso.info /enablefwdimmediate
sefautil.exe /server:SERVERNAMEHERE receptionphone@contoso.info 
/setfwddestination:receptionphone@contoso.info;opaque=app:voicemail

It was critical to run the disable tasks before running the enable tasks. When we did manual testing this didn't matter, but for the script to run successfully any existing forwarding needed to be disabled before enabling more.

These tasks have been running successfully for 6 weeks now without any issues.