Sunday, 7 July 2013

SCCM - SSRS report for Summary Update List Non-Compliance

Within the organization I currently contract too SCCM is utilized for the deployment of monthly Microsoft patches and to report on compliance through monthly and quarterly 'Update Lists'. To assist regional server and desktop teams track compliance of assets they manage against these update lists I have created various SSRS Reports, this is the first one I will be blogging about. 

My goal was to enable support teams to easily specify the CollectionID which groups the assets they are responsible for and select one or more update lists to easily identify which machines have not reached a compliant state. I  have tried to include other key pieces of information which are directly related to this kind of a analysis such as the windows update agent version, heartbeat, scan date and a link to another report for all available maintenance windows for the machine.

In addition to grouping all key information for reporting compliance on a per machine basis I have tried to immediately assist teams with their analysis by automatically highlighting when a heartbeat date or scan date is more than three days old. I also highlight when the windows update agent version is below a minimum known good version for SCCM 2007. These indicators should be investigated as a matter of priority, i.e. if the heartbeat is aged then local SCCM agent service is either stopped\offline or the agent is generally unhealthy. 

It is important that the role of the scan date\data is understood clearly by teams performing troubleshooting. I have seen it many times where server teams see a non-compliant server so believe that by manually installing an update this will immediately result in a compliant state. Support teams must understand that the scan process is not only utilized to determine update applicability locally on the machine but also all parent SCCM site servers in the hierarchy will independently calculate the compliance state for the machine based on the submitted scan state data.

Example Report Results. 

Report Features

  • Simple drop down selection criteria for SCCM deployments and deployment collections.
  • More complete overview of key information for each stage of the deployment of updates.
  • Near real time, changes are dependent on inter-site replication and the scan state submitted by client agents.
  • Ideal for drill through links to specific custom and default SCCM reports to display more detailed information on devices in a specific state.
All that should be required to get this report working in other environments that already have SSRS available is to upload the RDL and use Report Builder to modify the server details to point to their SCCM SQL database server name and database name. 

RDL Downloads

Another report which displays the same data but in a Excel friendly table format.

Please feel free to post a comment if you are unable to get the report working successfully.

Tuesday, 30 April 2013

Tuning SCCM 2007 & 2012 Site-to-Site Replication with Thread Settings.

For the longest time I continually listened to people referring to SMS\SCCM as 'slow management server' and for a while I agreed, that is until I discovered the settings that I believe can dramatically improve the time it takes to replicate configuration changes throughout a multiple site the hierarchy. I have looked around on the web and spoken with a Microsoft PFE and there does not appear be any publicly available guidance from Microsoft on best practices for tuning site and software distribution thread settings. The approach I outline below was discovered more through trial and observation that anything else and so your mileage may vary if applied to your environment.

To further illustrate my point I have added the below graphs and related notes:

Fig 1.  Multiple concurrent active package distributions and metadata replication backlogging indicating possible thread misconfiguration.
          Top -\outboxes\lan  (Site to Site metadata)
          Bottom – Distribution\Incoming (Package Distribution)

Notice in Fig1 the relationship between the number of concurrent package distributions (below) and the minor backlogging in the\outboxes\lan inbox (above). Once the active package distributions completed the backlogging would clear almost immediately. This was due to software package distributions consuming all available threads for a child site resulting in normal site-to-site metadata replication to backlog. Ideally with the all thread settings tuned correctly both package distributions and site to site metadata changes will flow concurrently with neither adversely impacting the other.

Fig 2. The results of tuning thread settings, note the sharp drop in the top graph immediately post change.
          Top -\outboxes\lan  (Site to Site metadata)
          Bottom – Distribution\Incoming (Package Distribution)

Notice in Fig 2. that as soon as the four configurable thread settings were tuned, we immediately saw a noticeable reduction in metadata backlogging which has improved the efficiently of site-to-site communication and reliability of package distribution. The graphs above represent a second tier primary site which has over a dozen child primary and secondary sites supporting 25k clients under normal load conditions. Historically the expectation was that configuration changes could take hours or days to replicate to the lowest site tier (5 levels) however after tuning these settings the average is now 15 minutes or less for a change made at the central site to fully replication to all sites globally. This has also had a positive impact on the upstream replication of client 'state' and site 'status' messages.

Standard Sender Properties:

  • 'All Sites: Number of directly connected child sites multiplied by 10 threads.
  •  Per Site: 10 threads
Note: The above assumes three directly connected child sites.

Software Distribution Properties:

  •  Max number of packages’ multiplied by ‘Max threads per package’ = 'Per Site' -2

i.e. 4 Max number of concurrent packages distributions multiplied by 2 threads per package = Software distribution (Packages) is limited to a maximum of 8 threads, thus always allowing 2 spare threads for site-to-site replication. 

The above settings will result 2 spare threads per site to always allow site-to-site configuration\metadata to flow and not be blocked by any active package distributions.

Key Points:

  • Rate limiting on addresses should be avoided as its use results in only 1 thread being available for site to site metadata replication and package content transfers. Where ever possible rely on other networking technologies such as QOS or Riverbeds to manage WAN link utilization.
  • Sender ‘Maximum Concurrent Sendings’ thread settings should be set based on the number of directly connected child sites and reviewed periodically.
  • Sender thread settings per site should exceed by at least 2 threads what is configured for package distribution to allow headroom for site to site configuration metadata replication to always occur.
  • As a result of the above tuning, issues or abnormal inbox traffic trends are much easier to identify.

Disclaimer: The above has been tested in a 60k production SCCM 2007  environment. ConfigMgr 2012 has the same configurable settings so I assume the same principles can be applied.

Monday, 11 March 2013

PowerShell - Find SCCM agents that are in an 'unknown' state in DCM reporting.

Last week I was able to resolve an ongoing and extremely annoying aspect to SCCM and DCM. When reviewing the evaluation results of a DCM baseline in either the classic or SRS reports you will sometimes find that the machines in compliant and non-compliant state do NOT add up to the total machines targeted for a rule. The annoying part is there is no easy way to determine which machines have not returned and state information, we even engaged Microsoft PSS for a long running support case to attempt to write a custom report to enable us to identify these machines. Using a module I wrote recently I was able to remotely check all assigned DCM baselines against all our site servers and was able to determine which one was having issues evaluating the rules and not returning a state value. As you can see below the server has received policy to assign the additional baselines but is having issues completeing the rule evaluation.