z/OS Workload Manager (WLM): Balancing Cost & Performance

A sophisticated mechanism is required to orchestrate the allocation of System z resources (E.g. CPU, Memory, I/O) to multiple z/OS workloads, requiring differing business processing priorities. Put very simply, a mechanism is required to translate business processing requirements (I.E. SLA) into an automated and equitable z/OS performance manager. Such a mechanism will safeguard the highest possible throughput, while delivering the best possible system responsiveness. Ideally, such a mechanism will assist in delivering this optimal performance, for the lowest cost; for z/OS, primarily Workload License Charges (WLC) related. Of course, the Workload Management (WLM) z/OS Operating System component delivers this functionality.

A rhetorical question for all z/OS Performance Managers and z/OS MLC Cost Managers would be “how much importance does your organization place on WLM and how proactively do you manage this seemingly pivotal z/OS component”? In essence, this seems like a ridiculous question, yet there is evidence that suggests many organizations, both customer and ISV alike, don’t necessarily consider WLM to be a fundamental or high priority performance management discipline. Let’s consider several reasons why WLM is a fundamental component in balancing cost and performance for each and every z/OS environment:

  • CPU (MSU) Resource Capping: Whatever the capping method (I.E. Absolute, Hard, Soft), WLM is a controlling mechanism, typically in conjunction with PR/SM, determining when capping is initiated, how it is managed and when it is terminated. Therefore from a dispassionate viewpoint, any 3rd party ISV product that performs MSU optimization via soft capping mechanisms should ideally consider the same CPU (E.g. SMF Type 70, 72, 99) instrumentation data as WLM. Some solutions don’t offer this granularity (E.g. AutoSoftCapping, iCap).
  • MLC R4HA Cost Management: WLM is the fundamental mechanism for controlling this #1 System z software TCO component; namely WLM collects 48 consecutive metric CPU MSU resource usage every 5 Minutes, commonly known as the Rolling 4 Hour Average (R4HA). In an ideal world, an optimally managed workload that generates a “valid monthly peak”, will fully utilize this “already paid for” available CPU MSU resource for the remainder of the MLC eligible month (I.E. Start of the 2nd day in a calendar month, to the end of the 1st day in the next calendar month). More recently, Country Multiplex Pricing (CMP) allows an organization to move workloads between System z server (I.E. CPC) structures, without cost consideration for cumulative R4HA peaks. Similarly, Mobile Workload Pricing (MWP) reporting will be simplified with WLM service definitions in z/OS 2.2. Therefore it seems prudent that real-time WLM management, both in terms of real-time reporting and pro-active decision making makes sense.
  • System z Server CPU Management: As System z server CPU chips evolve (E.g. CPU Chip Cache Hierarchy and Relative Nest Intensity), there are complementary changes to the z/OS Operating System management components. For example, HiperDispatch Mode delivers CPU resource usage benefit, considering CPU chip cache resources, intelligently allocating workload to as few logical processors as possible. It therefore follows that prioritization of workloads via WLM policy definitions becomes increasingly important. In this instance one might consider that CPU MF (SMF Type 113) and WLM Topology (SMF Type 99) are complementary reporting techniques for System z server design and management.

Since its announcement in September 1994 (I.E. MVS/ESA Version 5), WLM has evolved to become a fully-rounded and highly capable z/OS System Resources Manager (SRM), simply translating business prioritization policies into dynamic function, optimizing System z CPU, Memory and I/O resources. More recently, WLM continues to simplify the management of CPU chip cache hierarchy resources, while reporting abilities gain in strength, with topology reporting and the promise of simplified MWP reporting. Moreover, WLM resource management becomes more granular and seemingly the realm of possibility exists to “micro manage” System z performance, as and if required. Conversely, WLM provides the opportunity to simplify System z performance management, with intelligent workload differentiation (I.E. Subsystem Enclave, Batch, JES, USS, et al).

Quite simply, IBM are providing the instrumentation and tools for the 21st Century System z Performance and Software Cost Subject Matter Expert (SME) to deliver optimal performance for minimal cost. However, it is incumbent for each and every System z user to optimize software TCO, proactively implementing new processes and leveraging from System z functions accordingly.

Returning to that earlier rhetorical question about the importance of WLM; seemingly its importance is without doubt, primarily because of its instrumentation and management abilities of increasingly cache rich System z CPU chips and its fundamental role in controlling CPU MSU resource, vis-à-vis the R4HA.

Although IBM will provide the System z user with function to optimize system performance and cost, for obvious commercial reasons IBM will not reduce the base cost of System z MLC software. However, recent MLC pricing announcements, namely Country Multiplex Pricing (CMP), Mobile Workload Pricing (MWP) and Collocated Application Pricing (zCAP) provide tangible options to reduce System z MLC TCO. Therefore the System z user might need to consider how they can access real-time WLM performance metrics, intelligently combining this instrumentation data with function to intelligently optimize CPU MSU resource, managing the R4HA accordingly.

Workload X-Ray (WLXR) from zIT Consulting simplifies WLM performance reporting, enabling users to drill down into the root cause of performance variances in a very fast and easy way. WLXR assists in root cause problem determination by zooming in, starting from a high level overview, going right down to detailed Service Class performance information, such as the Performance Index (PI), showing potential bottleneck situations during peak time. Any system overhead considerations are limited, as WLXR delivers meaningful real time information on a “need to know” basis.

A fundamental design objective for WLXR is data reduction, only delivering the important information required for timely and professional workload management. Straight to the point information instead of data overload, sometimes from a plethora of data sources (E.g. SMF, System Monitors, et al). WLXR incorporates the following easy-to-use functions:

  • Simplified Data Collection & Storage: Minimal system overhead TCP/IP based agents periodically (E.g. 5, 15, 60 Minutes) collect CPU (Type 70) and WLM (Type 72) data. Performance data is stored centrally in near real-time, building a historical repository with intelligent analytics for meaningful information presentation.
  • Intelligent GUI Based Information Presentation: Meaningful decision based reports and graphs detailing CPU (E.g. MSU, R4HA, Weight) and WLM (E.g. Service Class, Performance Index, Response Time, Transaction Workflow) resource usage. A drill-down design provides a granularity of data presentation, for Management Summary to 3rd Level Technical Diagnostics use.
  • Corporate Identity Branding: A modular template design, allowing for easy corporate identity branding, with flexibility to easily add additional reports, as and if required.

Without doubt, WLM is a significant z/OS System Resources Management function, simplifying the translation of business workload requirements (I.E. Service Level Agreement) into timely and proactive allocation of major System z hardware resources (I.E. CPU, Memory, I/O). This management of System z resources has been forever thus for 20+ years, while WLM has always offered “software cost control” functionality, working with the various and evolving CPU capping techniques. What might not be so obvious, is that there is a WLM orientated price versus performance correlation, which has become more evident in the last 5 years or so. Whether Absolute Capping, HiperDispatch, Mobile Workload Pricing, Country Multiplex Pricing or evolving Soft Capping techniques, the need for System z users to integrate z/OS MLC pricing considerations alongside WLM performance based management is evident.

Historically there was not a clear and identified need for a z/OS Performance/Capacity Manager to consider MLC costs in their System z server designs. However, there is a clear and present danger that this historic modus operandi continues and there will only be one financial winner, namely IBM, with unnecessarily high MLC charges. Each and every System z user, whether large or small, can safeguard the longevity of their IBM Mainframe platform by recognizing and deploying proactive and current System z MLC cost management processes.

All too often it seems that capping can be envisaged as punitive, degrading system performance to reduce System z MLC costs. Such a notion needs to be consigned to history, with a focussed perspective on MSU optimization, where the valuable and granular MSU resource is allocated to the workload that requires such CPU resource, with near real-time performance profiling. If we perceive MSU optimization to be R4HA based and that IBM are increasing WLM function to assist this objective, CPU capping can be a benefit that does not adversely impact performance. As previously stated, once a valid R4HA peak has occurred, that high MSU watermark is available for the remainder of the MLC billing period. Similarly at a more granular level, once a workload has peaked and its MSU usage declines, the available MSU can be redirected to other workloads. With the introduction of Country Multiplex Pricing, System z users no longer need to concern themselves about creating a higher R4HA peak, when moving workloads between System z servers.

Quite simply, from the two most important perspectives, performance and cost optimization, WLM provides the majority of functionality to assist System z users get the best performance for the lowest cost. Analytics based products like Workload X-Ray (WLXR) assist this endeavour, analysing WLM data in near real-time from a performance and MLC cost perspective. It therefore follows that if this important information is also available for sophisticated MSU optimization solutions, which consider WLM performance (E.g. zDynaCap, zPrice Manager), then proactive performance and cost management follows. It’s hard to envisage how a fully-rounded MSU optimization decision can be implemented in near real-time, from an MSU optimization solution that does not consider WLM performance metrics…