Friday 24 October 2014

Server 2012 R2 member server or domain controller can't enroll computer certificate

This article is titled Server 2012 domain controller can't enroll computer certificate, however this also applies to Server 2008 R2 and 2012.

This issue came about by receiving a call early Monday morning reporting that some devices couldn't connect to our wireless network. We immediately suspected an NPS issue, as users reported devices on other SSID were not affected.

Upon investigating the issue, We found one of our NPS connection request policies complaining there was no valid certificate available to authenticate EAP/PEAP requests. This was strange as all NPS are members of AD and should receive certificates via automatic certificate enrollment. In fact all these NPS initially received a certificate when they joined the domain via automatic enrollment.

Next we opened the Certificates (Local Computer) MMC snap-in and went to local computer personal certificates, the only certificate listed had expired the previous night, problem confirmed.



The Problem

Upon attempting to manually request a new certificate we were greeted with "Certificate types are not available". It was almost as if there was no certificate services in the domain.

We jumped onto another domain joined server and found we could successfully enroll a computer certificate and inspection of other servers showed they all had up to date and valid certificates, only this one server was affected.

The only difference between the problematic server and other servers tested was our faulting server was a domain controller. This led us to the Certificate Templates snap-in on the Certificate Authority (CA) server.



The Resolution
1. Go to your CA and open the Certificate Templates snap-in. 
2. Right click Computer (or whatever Certificate you are trying to enroll and click properties) 
3. Open the Security tab 
4. Add "Domain Controllers" read and enroll permission, apply and close. If your member servers are experiencing this issue, you may need to add "Domain Computers" with read and enroll.



On our CA, "Domain Controllers" was missing permissions on the Computer certificate. As soon as we added Domain controllers with read and enroll permissions, the NPS could immediately enroll a new certificiate.

Originally when our domain controller was a member server, it was part of the Domain Computers group, which had the required read and enroll permissions and therefore received a certificate upon joining the domain. However the member server was then upgraded to a DC and when it attempted to request a new certificate it was denied as it didn't have appropriate permission.

A big problem caused by a simple and easy to fix error.

Tuesday 22 July 2014

Citrix Xendesktop 7.1/7.5 black screen on login

Nothing is worse than putting in all the effort to build a new Xendesktop environment, PVS farm and master image before finding yourself faced with the dreaded black/blank screen on login.

There are a number of reasons this can occur, including enhanced desktop experience, however there are some factors that occur in the most common cases.
  • Windows 8, Windows 8.1, Server 2012 or Server 2012 R2 is used
  • Xendesktop 7.1 or 7.5 is used
  • 8 dot 3 name creation was disabled at the time of installing the VDI
  • PVS was used in the image creation process
This problem is normally associated with 8 dot 3 name creation being disabled. The "HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\Windows\AppInit_DLLs" registry key references mfaphook.dll (or mfaphook64.dll if 64bit) via its 8dot3 name. 

In basic terms

No 8dot3 name = mfaphook appinit location pointing to no where = no mfaphook loaded on login = no desktop for the user

Once 8 dot 3 is disabled it can't be re-enabled without significant work, in most cases a re-install of the underlying operating system is going to be quicker and more reliable. This is an annoying fault as many PVS optimization guides list disabling 8 dot 3 name creation as a performance enhancing tweak. 
However there is a reliable work around.

Before proceeding with the work around, you can do the following test to determine if you have 8 dot 3 name creation disabled.

Dot 3 Name creation disabled - the below workaround may assist

C:\>dir program*. /x
Volume in drive C is DDC1
Directory of C:\
06/05/2012 10:41 AM <DIR> Program Files
06/05/2012 04:49 PM <DIR> Program Files (x86)


Dot 3 Name creation enabled - the below workaround may not assist
C:\>dir program*. /x
Volume in drive C is DDC1
Directory of C:\
06/05/2012 10:41 AM <DIR> PROGRA~1 Program Files
06/05/2012 04:49 PM <DIR> PROGRA~2 Program Files (x86)



The workaround


We noted that the "HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\Windows\AppInit_DLLs" registry key referenced "C:\Program Files\Citrix\System32\mfaphook64.dll" but via its 8dot3 name "C:\Progra~1\Citrix\System32\mfaphook64.dll".


Initially we tried simply adding "C:\Program Files\Citrix\System32\mfaphook64.dll" to the AppInit_DLLs string, this didn't work.

To fix the problem first we added "C:\Program Files\Citrix\System32\" to our systems PATH environmental variable.

1. Open Control Panel, click System

2. Click Advanced system settings
3. Click environmental variables
4. From the Systems variable list, select "Path" and click edit
5. Be sure to leave the existing string, but add the below line to the end of the string. Yes it does need the semicolon.
;C:\Program Files\Citrix\System32\



Next we add the reference to the registry.

1. Open regedit

2. Go to HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\Windows
3. Open the AppInit_DLLs key
4. If the key is empty, simply add "mfaphook64.dll" for 64 bit systems or "mfaphook.dll" for 32 bit. Don't include the quotations.

If there is already data in the AppInit_DLLs key, add a ; before the dll. For example ";mfaphook64.dll", again without the quotations.




As we added the directory in the environmental PATH variable we don't need to specify a path in the registry key. When the system looks for mfaphook it will search all the directories in %PATH%.


That should be it, you don't even need to reboot. Your users should now be able to login without a black screen. If they still can't, then I suggest you try to re-install to VDA and disable enhanced desktop experience as mfaphook loading likely isn't your issue.

Thursday 10 July 2014

Windows installer can't find HP DL 380 server hard disk

This is a very frustrating but common problem when building a new server. You un-box and boot up the sparkling new machine, connect your virtual media to a windows ISO and begin the installation. However when you get to the disk selection dialogue Windows can't find the disk.

You spend the next 30 minutes searching the HP website for SATA pre-installation drivers and none of them work, until you find the most simple solution to the problem.



The Solution

Luckily for us, HP has been including these drivers on-board for some time now.

1. Fire up the server and jump into BIOS. 
2. Select Advanced Options. 
3. Select Advanced System ROM Options.











4. Select Virtual Install Disk.
5. Select Enable.
 
6. Save and exit BIOS.

Next time you fire up that Windows installation, windows will magically see that new Raid 10 array you just created.


Tuesday 17 June 2014

Update to Yubiradius modification - Use only OTP and allow temporary tokens

In 2012 we released a modification patch to Yubiradius 3.5.1 that supports using just the OTP for authentication. The default Yubiradius install requires the AD password + OTP to be combined as a single string.

In a deployments where you want 2 password fields, 1 for AD and another for Yubiradius, a modification is needed to the Yubiradius ropverify.php

After our initial release we made some further modifications that allowed temporary tokens. Our initial release could only be used for the full OTP, if a temporary 8 character token was issued, this would not work.





The Patches

If you are already running our modified version of Yubiradius 3.5.1 ropverify.php then you can apply THIS patch to just add temporary token support.

If you are running a standard 3.5.1 and want to add OTP only auth with temporary token support please apply THIS patch.

Please follow the step by step instructions on our original blog post if you need support in applying the patch to /var/www/wsapi/ropverify.php

Also available HERE is a modification by "Dan" that works with Yubiradius 3.5.4, however this DOESN'T have temporary token support. You could apply this, then my temp token patch and it would work fine.

We have since ceased using Yubiradius in our environment so we can't verify if versions later than 3.5.4 are supported.

Monday 19 May 2014

Adobe Premiere Elements takes a long time to load in a firewalled environment

After going through the pain that is packaging Adobe Premiere Elements for a domain environment, we found ourselves faced with another issue.

On any machines that don't have direct access to the internet, the Premiere Elements initial load took a very long time.

However Adobe does provide some logging to diagnose start-up problems, which can be found under \Users\[username]\AppData\Local\Temp\amt3.log (Windows 7 and above).



The Solution

An analysis of the logs identified a request for "Activating License" and shortly after a "HTTP Request Status code 502" which indicates a bad gateway.

The line after the license activation begins gives us some clues to the solution. "License server is https://activate.adobe.com/servlets/inet_sl/sl_v1_7_rclient (protocol=slcore)"



The easiest way to get rid of the Premiere Elements delay is to simply allow activate.adobe.com:443 to pass directly to the internet. Premiere is smart enough to use the OS proxy, so you may need to add a proxy exclusion to Internet Explorer.

If you have no way of allowing traffic directly out, then some form of proxy server may be your only option.




Thursday 24 April 2014

DPM 2012 R2 failing to backup SQL 2012 SP1 Always On Cluster VM

After we created a new Server 2012 R2 cluster and migrated our VMs across, the next job was to get backup in place with DPM 2012 R2.

Most of the machines backed up with Online backup without issue and our SQL cluster was no exception, the online backup initiated without fault. However after some time one of the Always On cluster came back with the following error.

The VSS application writer or the VSS provider is in a bad state. Either it was already in a bad state or it entered a bad state during the current operation. (ID 30111 Details: VssError:The writer experienced a non-transient error.  If the backup process is retried,
the error is likely to reoccur.
 (0x800423F4))

Digging further on the SQL machine with the "vssadmin list writers" command, we found that the SQL VSS Writer service was in an error state.



The Solution

The solution is very simple but introduces some other problems.

1. Jump on the erroneous SQL server.
2. Open a command prompt and type services.msc
3. Go the SQL VSS Writer service and stop the service.
4. Re-run the DPM job.

This successfully completes the VM snapshot backup, but it introduces an issue with SQL backup if you are also using DPM for SQL backup on the same Always On cluster.

In a SQL Always On cluster, the DPM backups are by normally taken from the secondary node. The primary node is left alone to do SQL, while the inactive secondary takes the heavy backup load.

This is why we only had the problem on our secondary node, it is also why it introduces an SQL backup problem. When the SQL VSS Writer service is stopped, DPM can't perform SQL backups on the server.

Two possible options are:

1. Don't perform VM snapshot on your backup node, it may be overkill anyway. In the event of a failure you can spin up a new VM and make it the new secondary.

2. Run a schedule that disables the SQL VSS Writer service some time before the VM snapshot and re-enable it again after the snapshot.

We are using option 2 and so far it is working well. You need to ensure that your SQL backup isn't occurring during your SQL VSS writer disabled period, but your a good admin and wouldn't have your VM snapshot and SQL backup scheduled at the same time anyway!

Monday 7 April 2014

Lync 2013 response group alternatives - bypassing the annoying beep

Lync 2013 is a great product, especially for education with Microsoft agreements in place. It can offer a low cost PABX replacement that integrates seamlessly with other Microsoft products.

Arguably the biggest downfall of Lync throughout multiple releases has been the response groups and how they handle incoming calls. Features such as business hours and the queues work well, but the annoying response group connection beep is enough to push any sane receptionist over the edge.

The other big downfall of response groups is survivability. When deploying a resiliency group, a manual fail-over of the CMS is required for response groups to work. Normally response groups are used on critical extensions, such as reception, lines that you can ill afford to have down for any length of time.

For anyone that is unfamiliar, when a user picks up a call that is routed through a response group, after picking up the handset an audible beep is played until the transfer from the response group to the user picking up the call is complete. Depending on the endpoint (phone, Lync client, etc) this beep can take longer and sound different, but one thing is consistent, it drives receptionist crazy.

There are a number of proposed workarounds to response groups, including using call delegation, but all of them remove the automated "out of hours" component of the response group. The ability for the phone to automatically transfer to voice-mail between specific hours and ring during work hours.



The Solution


Part 1, Calling queues

We like the idea of simultaneous ring to delegates. One of the biggest benefits of response groups is having an alternate call group ring after a pre-determined amount of time. In response groups this is an alternate queue with a delay. This allows a receptionist on a call, or not available, to be covered by other staff, without bombarding these staff with every call.

To do this we create a standard Lync login with domain account, and use this as the "control" account.

We sign the reception in with the control account and then set-up a simultaneous ring to delegates with a delay. This is a simple and elegant solution for simple call groups. However if you have a complex call group with multiple queues and delays this won't work.

This is nothing new, lots of bloggers have suggested this as a genuine alternative to a response group, but it is lacking the out of hours functionality that a response group has.


Part 2, Managing out of hours

This is where the white board came into play and we came up with a plausible idea.

Windows task scheduler, simple right? Task scheduler can run a batch file or command at any predefined time or on the triggering of an event.

We are keeping it simple and triggering the following events.

Monday-Friday 8AM: Simultaneously ring delegates with delay
Monday-Friday 4PM: Forward calls directly to voice-mail

This ensures that calls are forwarded as required during business hours and outside of business hours the call is forwarded straight to voice-mail.

To run these tasks you will need to configure sefautil as a Lync trusted application, this is outside the scope of this blog but is available on many other blogs.

In testing we found the order in which sefautil tasks are executed is critical to success, below is an example of the tasks we are running. Note that before you run these commands you need to set-up your list of delegates.


8AM script, disable voice-mail fwd and enable call delegates fwd with 15 second delay
sefautil.exe /server:SERVERNAMEHERE receptionphone@contoso.info /disablefwdimmediate
sefautil.exe /server:SERVERNAMEHERE receptionphone@contoso.info /enablesimulring
sefautil.exe /server:SERVERNAMEHERE receptionphone@contoso.info /simulringdelegates /delayringdelegates:15


4PM script, disable call forward, enable voice-mail fwd
sefautil.exe /server:SERVERNAMEHERE receptionphone@contoso.info /disabledelegation
sefautil.exe /server:SERVERNAMEHERE receptionphone@contoso.info /enablefwdimmediate
sefautil.exe /server:SERVERNAMEHERE receptionphone@contoso.info 
/setfwddestination:receptionphone@contoso.info;opaque=app:voicemail

It was critical to run the disable tasks before running the enable tasks. When we did manual testing this didn't matter, but for the script to run successfully any existing forwarding needed to be disabled before enabling more.

These tasks have been running successfully for 6 weeks now without any issues.

Thursday 6 March 2014

Internet Explorer 11 appears blank on Windows 8.1 and Server 2012 R2

We have intermittently had clients complaining that their Internet Explorer was blank. When they open the desktop version of Internet Explorer 11 we experience they blank internet browser, if they type a URL and press enter IE does open the site. Menu options such as settings and about are greyed out.

We have had success deleting a users profile, this resolved the issue, however it is not the best experience for the end user.

After taking some time to troubleshoot the issue we were able to pinpoint the registry key "HKCU\Software\Microsoft\Internet Explorer" as the problematic area.




The Resolution

We don't have a permanent fix, it looks like this problem transcends both Server 2012 R2 and Windows 8.1, so a future Microsoft fix may resolve the issue.

However you can fix this on a case by case basis with the following.

1. Log on as the problematic user

2. Open regedit

3. Navigate to HKCU\Software\Microsoft

4. Delete the "Internet Explorer" key

You could also script it with reg delete or add it as a Citrix UPM registy exclusion in a VDI environment if you don't have the requirement to save Internet Explorer settings changes.

Monday 24 February 2014

DPM 2012 R2 only allows offline backups of VM with updated integration tools

All DPM admins that backup Hyper-V VMs have seen the dreaded offline backup state before. Admins of earlier versions of DPM might know this as "Saved State", either way its annoying and disruptive to backup VMs in an offline/saved state.

Recently when we did a migration of VMs from one Hyper-V cluster to another with SCVMM, we came across an issue where most of the VMs appeared as "Offline backup" in the DPM console.

This can be caused by a multitude of issues, including not having a SCSI adapter connected to the VM and running an old version of the integration tools.

However after checking all the requirements (listed below), we were still only able to capture offline backups on some VMs.
  • The Backup (Volume Snapshot) Integration Service is disabled or not installed.
  • A virtual machine has one or more dynamic disks.
  • A virtual machine has one or more volumes that are based on non-NTFS file systems.
  • In a cluster configuration, the virtual machine Cluster Resource Group is offline.
  • A virtual machine is not in a running state.
  • A Shadow Storage assignment of a volume inside the virtual machine is explicitly set to a different volume other than itself.

In testing we removed a VHD from a VM that had multiple VHD connected and then tested it, this worked! Funnily enough, when we re-added the VHD, the VM still could be backed up online. This led us to a fix that worked on most of our problematic VMs.



The Resolution

1. Open up Hyper-V or SCVMM.
2. Go into the hard drive settings of the problematic VM and remove all VHD (not delete, just remove for now - do I even need to say that?).
3. Save the VM.
4. Re-add the VHD and save the VM again.
5. Start the VM
6. Attempt to backup with DPM again, you may need to clear the state cache in DPM

Essentially we are doing nothing, removing and re-adding a configuration. However this seems to resolve the issue on most VM. This may or may not be similar in your environment, as these VM had all been migrated between Hyper-V servers with VMM.

Wednesday 5 February 2014

Lync 2013 CMS error: Instance points to a CMS store that is being moved to a different pool

We were doing some routine checks of a newly provisioned Lync 2013 front end fail-over pool when we discovered a CMS error on the backup pool in the event logs. Our deployment is configured with two standard edition front end pools with resiliency.

The error is as follows:

Event ID 4082
LS Backup Service

Microsoft Lync Server 2013, Backup Service central management backup module failed to complete import operation.
Configurations:
Backup Module Identity:CentralMgmt.CMSMaster
Working Directory path:\\lync002.contoso.info\lyncstore\1-BackupService-20\BackupStore\Temp
Local File Store Unc path:\\
lync002.contoso.info\lyncstore\1-BackupService-20\BackupStore
Remote File Store Unc path:\\
lync002.contoso.info\lyncstore\1-BackupService-21\BackupStore
 Additional Message:
 Exception: Microsoft.Rtc.BackupService.ModuleUnavailableException: Backup module is temporarily unavailable at this point. Reason: Instance (local)\rtc points to a CMS store that is being moved to a different pool.
   at Microsoft.Rtc.BackupService.BackupModules.CentralMgmtBackupModule.CheckModuleAvailability(Nullable`1 primaryPool)
   at Microsoft.Rtc.BackupService.BackupModules.CentralMgmtBackupModule.GetBackupCookie()
   at Microsoft.Rtc.BackupService.BackupModuleHandler.ReceiveBackupDataTask.GetBackupCookie(Boolean& isModuleInitialized)
   at Microsoft.Rtc.BackupService.BackupModuleHandler.ReceiveBackupDataTask.InternalExecute()
   at Microsoft.Rtc.Common.TaskManager`1.ExecuteTask(Object state)
Cause: Either network or permission issues. Please look through the exception details for more information.
Resolution:
Resolution



This error message was repeated every 15 minutes as the backup jobs were run. A separate error was created for the import and export process.

During our testing phase we did some fail-over testing which included moving the CMS database to the fail-over node. We believe this may have caused this error to occur. While it was not causing any known issues in the environment, it may come to a head should we ever need to fail over to our resiliency node.



The Resolution

The resolution was fairly simple but did require a few attempts before we got the process order correct.


1. Open the Lync Topology Builder and remove the associated backup pool from the resiliency tab of the primary front end pool.

2. Publish the topology

3. Use the Get-CsManagementStoreReplicationStatus powershell applet to verify the topology has replicated to the associated backup pool.

4. Open a command prompt to "%ProgramFiles%\Microsoft Lync Server 2013\Deployment" and run the bootstrapper.exe on the backup pool. This will remove the management and backup service from the backup pool.

5. Remove the CMS database from the backup pool using the following powershell command.
Uninstall-CsDatabase -CentralManagementDatabase -SqlServerFqdn lync002.contoso.info


6. Go back to the Topology Builder and re-add the associated backup pool to the primary front end pool.

7. Publish the topology

8. Wait for the topology to replicate

9. Re-run the bootstrapper on the backup pool node


This 5 minute process fixed the error messages for us and hopefully saves us in the unfortunate even of a fail-over.

Friday 3 January 2014

SCOM 2012 OleDB Module 0x80004005 errors after SQL database move


After migrating our SCOM 2012 R2 DB and Data Warehouse DB to a new SQL server we were receiving SCOM alerts that there was a problem with the OleDB module.


The Problem

The initial alert indicated that there was a login problem. This prompted us to check, re-check and triple-check all the SQL logins and permissions between the old and new SQL servers.

Alert description: OleDb Module encountered a failure 0x80004005 during execution and will post it as output data item. Unspecified error: Cannot open database "OperationsManager2012" requested by the login. The login failed.
Workflow name: Microsoft.SystemCenter.SqlBrokerAvailabilityMonitorForPool

After not having much luck we eventually decommissioned the old SQL server. Once the old SQL server was turned off, the alert changed from a login failed to a "SQL server does not exist". This error got us thinking, maybe its not a permission problem but some parts of SCOM may have been still pointing at the old SQL server.
Alert description: OleDb Module encountered a failure 0x80004005 during execution and will post it as output data item. Unspecified error: [DBNETLIB][ConnectionOpen (Connect()).]SQL Server does not exist or access denied.
Workflow name: Microsoft.SystemCenter.SqlBrokerAvailabilityMonitorForPool


The Solution 

A search of the registry found a few keys that were undocumented in the SQL migration document I was reading. These keys are:

HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Setup\DatabaseServerName

HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Setup\DataWarehouseDBServerName
HKLM\Software\Microsoft\System Center\SetupBackup\Blue\Database\DatabaseServerName

After changing these keys and restarting the SCOM server all was well!