IBM z14: Pervasive Encryption & Container Pricing

On 17 July 2017 IBM announced the z14 server as “the next generation of the world’s most powerful transaction system, capable of running more than 12 billion encrypted transactions per day.  The new system also introduces a breakthrough encryption engine that, for the first time, makes it possible to pervasively encrypt data associated with any application, cloud service or database all the time”.

At first glance, a cursory review of the z14 announcement might just appear as another server upgrade release, but that could be a costly mistake by the reader.  There are always subtle nuisances in any technology announcement, while finding them and applying them to your own business can sometimes be a challenge.  In this particular instance, perhaps one might consider “Persuasive Encryption & Contained Pricing”…

When IBM releases a new generation of z Systems server, many of us look to the “feeds and speeds” data and ponder how that might influence our performance and capacity profiles.  IBM state the average z14 speed compared with a z13 increase by ~10% for 6-way servers and larger.  As per usual, there are software Technology Transition Offering (TTO) discounts ranging from 6% to 21% for z14 only sites.  However, in these times where workload profiles are rapidly changing and evolving, it’s sometimes easy to overlook that IBM have to consider the holistic position of the IBM Z world.  Quite simply, IBM has many divisions, Hardware, Software, Services, et al.  Therefore there has to be interaction between the hardware and software divisions and in this instance, IBM have delivered a z14 server that is security focussed, with their Pervasive Encryption functionality.

Pervasive Encryption provides a simple and transparent approach for z Systems security, enabling the highest levels of data encryption for all data usage scenarios, for example:

  • Processing: When retrieved from files and processed by applications
  • In Flight: When being transmitted over internal and external networks
  • At Rest: When stored in database structures or files
  • In Store: When stored in magnetic storage media

Pervasive Encryption simplifies and reduces costs associated when protecting data by policy (I.E. Subset) or En Masse (I.E. All Of The Data, All Of the Time), achieving compliance mandates.  When considering the EU GDPR (European Union General Data Protection Regulation) compliance mandate, companies must notify relevant parties within 72 hours of first having become aware of a personal data breach.  Additionally organizations can be fined up to 4% of annual global turnover or €20 Million (whichever is greater), for any GDPR breach unless they can demonstrate that data was encrypted and keys were protected.

To facilitate this new approach for encryption, the IBM z14 infrastructure incorporates several new capabilities integrated throughout the technology stack, including Hardware, Operating System and Middleware.  Integrated CPU chip cryptographic acceleration is enhanced, delivering ~600% increased performance when compared with its z13 predecessor and ~20 times faster than competitive server platforms.  File and data set encryption is optimized within the Operating Systems (I.E. z/OS), safeguarding transparent and optimized encryption, not impacting application functionality or performance.  Middleware software subsystems including DB2 and IMS leverage from these Pervasive Encryption techniques, safeguarding that High Availability databases can be transitioned to full encryption without stopping the database, application or subsystem.

Arguably IBM had to deliver this type of security functionality for its top tier z Systems customers, as inevitably they would be impacted by compliance mandates such as GDPR.  Conversely, the opportunity to address the majority of external hacking scenarios with one common approach is an attractive proposition.  However, as always, the devil is always in the detail, and given an impending deadline date of May 2018 for GDPR compliance, I wonder how many z Systems customers could implement the requisite z14 hardware and related Operating System (I.E. z/OS) and Subsystem (I.E. CICS, DB2, IMS, MQ, et al) .upgrades before this date?  From a bigger picture viewpoint, Pervasive Encryption does offer the requisite functionality to apply a generic end-to-end process for securing all data, especially Mission Critical data…

Previously we have considered the complexity of IBM z Systems pricing mechanisms and in theory, the z14 announcement tried to simplify some of these challenges by building upon and formalizing Container Pricing.  Container Pricing is intended to greatly simplify software pricing for qualified collocated workloads, whether collocated with other existing workloads on the same LPAR, deployed in a separate LPAR or across multiple LPARs.  Container pricing allows the specified workload to be separately priced based on a variety of metrics.  New approved z/OS workloads can be deployed collocated with other sub-capacity products (I.E. CICS, DB2, IMS, MQ, z/OS) without impacting cost profiles of existing workloads.

As per most new IBM z Systems pricing mechanisms of late, there is a commercial collaboration and exchange required between IBM and their customer.  Once a Container Pricing solution is agreed between IBM and their customer, for an agreed price, an IBM Sales order is initiated, triggering the creation of an Approved Solution ID.  The IBM provided solution ID is a 64-character string representing an approved workload with an entitled MSU capacity, representing a Full Capacity Pricing Container used for billing purposes.

Previously we considered the importance of WLM for managing z/OS workloads and its interaction with soft-capping, and this is reinforced with this latest IBM Container Pricing mechanism.  The z/OS Workload Manager (WLM) enables Container Pricing using a resource classified as the Tenant Resource Group (TRG), defining the workload in terms of address spaces and independent enclaves.  The TRG, combined with a unique Approved Solution ID, represents the IBM approved solution.  As per standard SCRT processing, workload instrumentation data is collected, safeguarding that this workload profile does not directly impact the traditional peak LPAR Rolling Four-Hour Average (R4HA).  The TRG also allows the workload to be metered and optionally capped, independent of other workloads that are running collocated in the LPAR.

MSU utilization of the defined workload is recorded by WLM and RMF, subsequently processed by SCRT to subtract the solution MSU capacity from the LPAR R4HA.  The solution can then be priced independently, based on MSU resource consumed by the workload, or based upon other non-MSU values, specifically a Business Value Metric (E.g. Number of Payments).  Therefore Container Pricing is much simpler and much more flexible than previous IBM collocated workload mechanism, namely IWP and zCAP.

Container Pricing eliminates the requirement to commission specific new environments to optimize MLC pricing.  By deploying a standard IBM process framework, new workloads can be commissioned without impacting the R4HA of collocated workloads, being deployed as per business requirements, whether on the same LPAR, a separate LPAR, or dispersed across multiple LPAR structures.  Quite simply, the standard IBM process framework is the Approved Solution ID, associating the client based z/OS system environment to the associated IBM sales contract.

In this first iteration release associated with the z14 announcement, Container Pricing can be deployed in the following three solution based scenarios:

  • Application Development and Test Solution: Add up to 3 times more capacity to existing Development and Test environments without any additional monthly licensing costs; or create new LPAR environments with competitive pricing.
  • New Application Solution: Add new z/OS microservices or applications, priced individually without impacting the cost of other workloads on the same system.
  • Payments Pricing Solution: A single agreed value based price for software plus hardware or just software, via a number of payments processed metric, based on IBM Financial Transaction Manager (FTM) software.

IBM state z14 support for a maximum 2 million Docker containers in an associated maximum 32 TB memory configuration.  In conjunction with other I/O enhancements, IBM state a z14 performance increase of ~300%, when compared with its z13 predecessor.  Historically the IBM Z platform was never envisaged as being the ideal container platform.  However, its ability to seamlessly support z/OS and Linux, while the majority of mission critical Systems Of Record (SOR) data resides on IBM Z platforms, might just be a compelling case for microservices to be processed on the IBM Z platform, minimizing any data latency transfer.

Container Pricing for z/OS is somewhat analogous to the IBM Cloud Managed Services on z Systems pricing model (I.E. CPU consumption based).  Therefore, if monthly R4HA peak processing is driven by an OLTP application, or any other workload for that matter, any additional unused capacity in that specific SCRT reporting month can be allocated for no cost to other workloads.  Therefore z/OS customers will be able to take advantage of this approach, processing collocated microservices or applications for a zero or nominal cost.

County Multiplex Pricing (CMP) Observation: The z14 is the first new generation of IBM Z hardware since the introduction of the CMP pricing mechanism.  When a client first implements a Multiplex, IBM Z server eligibility cannot be older than two generations (I.E. N-2) prior to the most recently available server (I.E. N).  Therefore the General Availability (GA) of z14, classifies the z114 and z196 servers as previously eligible CMP machines.  IBM will provide a 3 Month grace period for CMP transition activities for these N-3 servers, namely z114 and z196.  Quite simply, the first client CMP invoice must be submitted within 90 days of the z14 GA date, namely 13 September 2017, no later than 1 January 2018.

In conclusion, Pervasive Encryption is an omnipresent z14 function integrated into every data lifecycle stage, which could easily be classified as Persuasive Encryption, simplifying the sometimes arduous process of classifying and managing mission-critical data.  As cybersecurity becomes an omnipresent clear and present danger, associated with impending and increasingly punitive compliance mandates such as GDPR, the realm of possibility exists to resolve this high profile corporate challenge once and for all.

Likewise, Container Pricing provides a much needed simple-to-use framework to drive MSU cost optimization for new workloads and could easily be classified as Contained Pricing.  The committed IBM Mainframe customer will upgrade their z13 server environment to z14, as part of their periodic technology refresh approach.  Arguably, those Mainframe customers who have been somewhat hesitant in upgrading from older technology Mainframe servers, might just have a compelling reason to upgrade their environments to z14, safeguarding cybersecurity challenges and evolving processes to contain z/OS MLC costs.

Optimize Your System z ROI with z Operational Insights (zOI)

Hopefully all System z users are aware of the Monthly Licence Charge (MLC) pricing mechanisms, where a recurring charge applies each month.  This charge includes product usage rights and IBM product support.  If only it was that simple!  We then encounter the “Alphabet Soup” of acronyms, related to the various and arguably too numerous MLC pricing mechanism options.  Some might say that 13 is an unlucky number and in this case, a System z pricing specialist would need to know and understand each of the 13 pricing mechanisms in depth, safeguarding the lowest software pricing for their organization!  Perhaps we could apply the unlucky word to such a resource.  In alphabetical order, the 13 MLC pricing options are AWLC, AEWLC, CMLC, EWLC, MWLC, MzNALC, PSLC, SALC, S/390 Usage Pricing, ULC, WLC, zELC and zNALC!  These mechanisms are commercial considerations, but what about the technical perspective?

Of course, System z Mainframe CPU resource usage is measured in MSU metrics, where the usage of Sub-Capacity allows System z Mainframe users to submit SCRT reports, incorporating Monthly License Charges (MLC) and IPLA software maintenance, namely Subscription and Support (S&S).  We then must consider the Rolling 4-Hour Average (R4HA) and how best to optimize MSU accordingly.  At this juncture, we then need to consider how we measure the R4HA itself, in terms of performance tuning, so we can minimize the R4HA MSU usage, to optimize cost, without impacting Production if not overall system performance.

Finally, we then have to consider that WLC has a ~17-year longevity, having been announced in October 2000 and in that time IBM have also introduced hardware features to assist in MSU optimization.  These hardware features include zIIP, zAAP, IFL, while there are other influencing factors, such as HyperDispatch, WLM, Relative Nest Intensity (RNI), naming but a few!  The Alphabet Soup continues…

In summary, since the introduction of WLC in Q4 2000, the challenge for the System z user is significant.  They must collect the requisite instrumentation data, perform predictive modelling and fully comprehend the impact of the current 13 MLC pricing mechanisms and their interaction with the ever-evolving System z CPU chip!  In the absence of such a simple to use reporting capability from IBM, there are a plethora of 3rd party ISV solutions, which generally are overly complex and require numerous products, more often than not, from several ISV’s.  These software solutions process the instrumentation data, generating the requisite metrics that allows an informed decision making process.

Bottom Line: This is way too complex and are there any Green Shoots of an alternative option?  Are there any easy-to-use data analytics based options for reducing MSU usage and optimizing CPU resources, which can then be incorporated into any WLC/MLC pricing considerations?

In February 2016 IBM launched their z Operational Insights (zOI) offering, as a new open beta cloud-based service that analyses your System z monitoring data.  The zOI objective is to simplify the identification of System z inefficiencies, while identifying savings options with associated implementation recommendations. At this juncture, zOI still has a free edition available, but as of September 2016, it also has a full paid version with additional functionality.

Currently zOI is limited to the CICS subsystem, incorporating the following functions:

  • CICS Abend Analysis Report: Highlights the top 10 types of abend and the top 10 most abend transactions for your CICS workload from a frequency viewpoint. The resulting output classifies which CICS transactions might abend and as a consequence, waste processor time.  Of course, the System z Mainframe user will have to fix the underlying reason for the CICS abend!
  • CICS Java Offload Report: Highlights any transaction processing workload eligible for IBM z Systems Integrated Information Processor (zIIP) offload. The resulting output delivers three categories for consideration.  #1; % of existing workload that is eligible for offload, but ran on a General Purpose CP.  #2; % of workload being offloaded to zIIP.  #3; % of workload that cannot be transferred to a zIIP.
  • CICS Threadsafe Report: Highlights threadsafe eligible CICS transactions, calculating the switch count from the CICS Quasi Reentrant Task Control Block (QR TCB) per transaction and associated CPU cost. The resulting output identifies potential CPU savings by making programs threadsafe, with the associated CICS subsystem changes.
  • CICS Region CPU Constraint: Highlights CPU constrained regions. CPU constrained CICS regions have reduced performance, lower throughput and slower transaction response, impacting business performance (I.E. SLA, KPI).  From a high-level viewpoint, the resulting output classifies CICS Region performance to identify whether they’re LPAR or QR constrained, while suggesting possible remedial actions.

Clearly the potential of zOI is encouraging, being an easy-to-use solution that analyses instrumentation data, classifies the best options from a quick win basis, while providing recommendations for implementation.  Having been a recent user of this new technology myself, I would encourage each and every System z Mainframe user to try this no risk IBM z Operational Insights (zOI) software offering.

The evolution for all System z performance analysis software solutions is to build on the comprehensive analysis solutions that have evolved in the last ~20+ years, while incorporating intelligent analytics, to classify data in terms of “Biggest Impact”, identifying “Potential Savings”, evolving MIPS measurement, to BIPS (Biggest Impact Potential Savings) improvements!

IBM have also introduced a framework of IT Operations Analytics Solutions for z Systems.  This suite of interconnected products includes zOI, IBM Operations Analytics for z Systems, IBM Common Data Provider for z/OS and IBM Advanced Workload Analysis Reporter (IBM zAware).  Of course, if we lived in a perfect world, without a ~20 year MLC and WLC longevity, this might be the foundation for all of our System z CPU resource usage analysis.  Clearly this is not the case for the majority of System z Mainframe customers, but zOI does offer something different, with zero impact, both from a system impact and existing software interoperability viewpoint.

Bottom Line: Optimize Your System z ROI via zOI, Evolving From MIPS Measurement to BIPS Improvements!

z/OS Workload Manager (WLM): Balancing Cost & Performance

A sophisticated mechanism is required to orchestrate the allocation of System z resources (E.g. CPU, Memory, I/O) to multiple z/OS workloads, requiring differing business processing priorities. Put very simply, a mechanism is required to translate business processing requirements (I.E. SLA) into an automated and equitable z/OS performance manager. Such a mechanism will safeguard the highest possible throughput, while delivering the best possible system responsiveness. Ideally, such a mechanism will assist in delivering this optimal performance, for the lowest cost; for z/OS, primarily Workload License Charges (WLC) related. Of course, the Workload Management (WLM) z/OS Operating System component delivers this functionality.

A rhetorical question for all z/OS Performance Managers and z/OS MLC Cost Managers would be “how much importance does your organization place on WLM and how proactively do you manage this seemingly pivotal z/OS component”? In essence, this seems like a ridiculous question, yet there is evidence that suggests many organizations, both customer and ISV alike, don’t necessarily consider WLM to be a fundamental or high priority performance management discipline. Let’s consider several reasons why WLM is a fundamental component in balancing cost and performance for each and every z/OS environment:

  • CPU (MSU) Resource Capping: Whatever the capping method (I.E. Absolute, Hard, Soft), WLM is a controlling mechanism, typically in conjunction with PR/SM, determining when capping is initiated, how it is managed and when it is terminated. Therefore from a dispassionate viewpoint, any 3rd party ISV product that performs MSU optimization via soft capping mechanisms should ideally consider the same CPU (E.g. SMF Type 70, 72, 99) instrumentation data as WLM. Some solutions don’t offer this granularity (E.g. AutoSoftCapping, iCap).
  • MLC R4HA Cost Management: WLM is the fundamental mechanism for controlling this #1 System z software TCO component; namely WLM collects 48 consecutive metric CPU MSU resource usage every 5 Minutes, commonly known as the Rolling 4 Hour Average (R4HA). In an ideal world, an optimally managed workload that generates a “valid monthly peak”, will fully utilize this “already paid for” available CPU MSU resource for the remainder of the MLC eligible month (I.E. Start of the 2nd day in a calendar month, to the end of the 1st day in the next calendar month). More recently, Country Multiplex Pricing (CMP) allows an organization to move workloads between System z server (I.E. CPC) structures, without cost consideration for cumulative R4HA peaks. Similarly, Mobile Workload Pricing (MWP) reporting will be simplified with WLM service definitions in z/OS 2.2. Therefore it seems prudent that real-time WLM management, both in terms of real-time reporting and pro-active decision making makes sense.
  • System z Server CPU Management: As System z server CPU chips evolve (E.g. CPU Chip Cache Hierarchy and Relative Nest Intensity), there are complementary changes to the z/OS Operating System management components. For example, HiperDispatch Mode delivers CPU resource usage benefit, considering CPU chip cache resources, intelligently allocating workload to as few logical processors as possible. It therefore follows that prioritization of workloads via WLM policy definitions becomes increasingly important. In this instance one might consider that CPU MF (SMF Type 113) and WLM Topology (SMF Type 99) are complementary reporting techniques for System z server design and management.

Since its announcement in September 1994 (I.E. MVS/ESA Version 5), WLM has evolved to become a fully-rounded and highly capable z/OS System Resources Manager (SRM), simply translating business prioritization policies into dynamic function, optimizing System z CPU, Memory and I/O resources. More recently, WLM continues to simplify the management of CPU chip cache hierarchy resources, while reporting abilities gain in strength, with topology reporting and the promise of simplified MWP reporting. Moreover, WLM resource management becomes more granular and seemingly the realm of possibility exists to “micro manage” System z performance, as and if required. Conversely, WLM provides the opportunity to simplify System z performance management, with intelligent workload differentiation (I.E. Subsystem Enclave, Batch, JES, USS, et al).

Quite simply, IBM are providing the instrumentation and tools for the 21st Century System z Performance and Software Cost Subject Matter Expert (SME) to deliver optimal performance for minimal cost. However, it is incumbent for each and every System z user to optimize software TCO, proactively implementing new processes and leveraging from System z functions accordingly.

Returning to that earlier rhetorical question about the importance of WLM; seemingly its importance is without doubt, primarily because of its instrumentation and management abilities of increasingly cache rich System z CPU chips and its fundamental role in controlling CPU MSU resource, vis-à-vis the R4HA.

Although IBM will provide the System z user with function to optimize system performance and cost, for obvious commercial reasons IBM will not reduce the base cost of System z MLC software. However, recent MLC pricing announcements, namely Country Multiplex Pricing (CMP), Mobile Workload Pricing (MWP) and Collocated Application Pricing (zCAP) provide tangible options to reduce System z MLC TCO. Therefore the System z user might need to consider how they can access real-time WLM performance metrics, intelligently combining this instrumentation data with function to intelligently optimize CPU MSU resource, managing the R4HA accordingly.

Workload X-Ray (WLXR) from zIT Consulting simplifies WLM performance reporting, enabling users to drill down into the root cause of performance variances in a very fast and easy way. WLXR assists in root cause problem determination by zooming in, starting from a high level overview, going right down to detailed Service Class performance information, such as the Performance Index (PI), showing potential bottleneck situations during peak time. Any system overhead considerations are limited, as WLXR delivers meaningful real time information on a “need to know” basis.

A fundamental design objective for WLXR is data reduction, only delivering the important information required for timely and professional workload management. Straight to the point information instead of data overload, sometimes from a plethora of data sources (E.g. SMF, System Monitors, et al). WLXR incorporates the following easy-to-use functions:

  • Simplified Data Collection & Storage: Minimal system overhead TCP/IP based agents periodically (E.g. 5, 15, 60 Minutes) collect CPU (Type 70) and WLM (Type 72) data. Performance data is stored centrally in near real-time, building a historical repository with intelligent analytics for meaningful information presentation.
  • Intelligent GUI Based Information Presentation: Meaningful decision based reports and graphs detailing CPU (E.g. MSU, R4HA, Weight) and WLM (E.g. Service Class, Performance Index, Response Time, Transaction Workflow) resource usage. A drill-down design provides a granularity of data presentation, for Management Summary to 3rd Level Technical Diagnostics use.
  • Corporate Identity Branding: A modular template design, allowing for easy corporate identity branding, with flexibility to easily add additional reports, as and if required.

Without doubt, WLM is a significant z/OS System Resources Management function, simplifying the translation of business workload requirements (I.E. Service Level Agreement) into timely and proactive allocation of major System z hardware resources (I.E. CPU, Memory, I/O). This management of System z resources has been forever thus for 20+ years, while WLM has always offered “software cost control” functionality, working with the various and evolving CPU capping techniques. What might not be so obvious, is that there is a WLM orientated price versus performance correlation, which has become more evident in the last 5 years or so. Whether Absolute Capping, HiperDispatch, Mobile Workload Pricing, Country Multiplex Pricing or evolving Soft Capping techniques, the need for System z users to integrate z/OS MLC pricing considerations alongside WLM performance based management is evident.

Historically there was not a clear and identified need for a z/OS Performance/Capacity Manager to consider MLC costs in their System z server designs. However, there is a clear and present danger that this historic modus operandi continues and there will only be one financial winner, namely IBM, with unnecessarily high MLC charges. Each and every System z user, whether large or small, can safeguard the longevity of their IBM Mainframe platform by recognizing and deploying proactive and current System z MLC cost management processes.

All too often it seems that capping can be envisaged as punitive, degrading system performance to reduce System z MLC costs. Such a notion needs to be consigned to history, with a focussed perspective on MSU optimization, where the valuable and granular MSU resource is allocated to the workload that requires such CPU resource, with near real-time performance profiling. If we perceive MSU optimization to be R4HA based and that IBM are increasing WLM function to assist this objective, CPU capping can be a benefit that does not adversely impact performance. As previously stated, once a valid R4HA peak has occurred, that high MSU watermark is available for the remainder of the MLC billing period. Similarly at a more granular level, once a workload has peaked and its MSU usage declines, the available MSU can be redirected to other workloads. With the introduction of Country Multiplex Pricing, System z users no longer need to concern themselves about creating a higher R4HA peak, when moving workloads between System z servers.

Quite simply, from the two most important perspectives, performance and cost optimization, WLM provides the majority of functionality to assist System z users get the best performance for the lowest cost. Analytics based products like Workload X-Ray (WLXR) assist this endeavour, analysing WLM data in near real-time from a performance and MLC cost perspective. It therefore follows that if this important information is also available for sophisticated MSU optimization solutions, which consider WLM performance (E.g. zDynaCap, zPrice Manager), then proactive performance and cost management follows. It’s hard to envisage how a fully-rounded MSU optimization decision can be implemented in near real-time, from an MSU optimization solution that does not consider WLM performance metrics…

System z: Optimizing DASD I/O Subsystem Performance

Historically there was a very simple synergy between the IBM S/370 Mainframe and its supporting disk I/O (DASD) subsystem, allowing for Mainframe host to physical and logical disk device (I.E. 3390) connectivity. The analysis and tuning of this I/O subsystem has always been and continues to be supported by the SMF Type 7n records via IBM RMF and the BMC CMF alternative. However, over the years, major advances in DASD subsystems and the System z Mainframe server have delivered many layers of technology resources (E.g. Cache, Memory, FICON Channels, RAID Storage, Proprietary Microcode, et al) and this has introduced complexities into highlighting DASD I/O subsystem performance problems.

The focus of technology based metrics (E.g. I/O Rate Response Time, I/O MB/S Bandwidth, et al) have also been complemented with more meaningful business focussed Service Level Agreements (SLA). Therefore today’s System z I/O Performance Analyst must gather and act upon proactive meaningful information from the ever-increasing amounts of performance data available. Put another way, too much data can deliver not enough information! As previously stated, it was forever thus, RMF and CMF have always collected the requisite performance data available and arguably no other data source is required (E.g. OMEGAMON/TMON/SYSVIEW Performance Monitor, SAS/MXG/MICS/WPS Performance Database). RMF/CMF is the ideal data source for thorough and timely System z I/O performance management, where intelligent analytics and expert knowledge are required to present this “Golden Record”.

However, today’s System z Support Teams need simple and timely presentation of the data, highlighting potential challenges, graphically presented for their Management, allowing for simple tracking of SLA agreements and technology changes (I.E. Software/Hardware Upgrades).

Additionally, Workload Manager (WLM) can control non-paging queued DASD I/O requests, based upon device busy conditional processing. Therefore the z/OS system can manage I/O priorities in a Sysplex, based on WLM service class goals. WLM dynamically adjusts the I/O priority based on service class goal performance and whether a DASD device can influence the overall performance objectives. For obvious reasons, this WLM function does not micro-manage I/O priorities, only changing a service class period’s I/O priority infrequently. WLM is deployed by many System z users to assist in the automated management of system resources (E.g. CPU, Memory, I/O, et al), based upon Service Level goals.

From a DASD subsystem technology viewpoint, there is no longer an obvious one-one direct connection between the Mainframe host and DASD device. An increasing number of technological advances, both microcode and hardware (E.g. Memory, Fibre Channel, Function Assist Processing, et al) have diminished the requirement for data access directly from the physical device. Put another way, in today’s world of System z servers with multiple cache level CPU chips (I.E. Relative Nest Intensity), massive and multiple processor memory resources (I.E. z13 @ 10 TB Memory), high bandwidth Fibre Channel (I.E. FICON, zHPF) subsystem and a hierarchy of DASD memory (I.E. SSD/Flash, Cache), it’s not uncommon to consider an I/O that requires physical device access as a problem! Finally and most importantly, from a DASD subsystem viewpoint, each of the recognized System z DASD providers, EMC (Symmetrix VMAX), HDS (VSP G1000) and IBM (DS8870) have highly proprietary DASD subsystems that provide z/OS plug compatibility, but deliver overall I/O performance using their own unique architecture and internal algorithms.

Of course, an over configured hardware environment will deliver a poor TCO, while an under configured environment will manifest in SLA issues and bad user experiences, where the middle-ground always delivers the optimal environment. Resource optimization always demands proactive day-to-day management, from an internal and indeed external communication viewpoint. With the highly proprietary design features of the IHV DASD subsystems, whether EMC, HDS or IBM, having the right information and identifying the precise problem, simplifies the communication process with the IHV. Such communication might highlight a resource under provision (E.g. Memory Capacity), a subsystem setting tweak requirement, either host or subsystem based, or indeed a hardware failure. In today’s world, these issues need to be fixed in minutes or hours, not days or weeks.

Therefore, where does today’s System z I/O Performance Analyst start to collect the required information to safeguard that their DASD subsystem is optimized, both from a capacity and performance viewpoint?

A simplistic viewpoint of an I/O health-check should consider the following:

  • Service Level Agreements (SLA): Are overall objectives being delivered or missed?
  • User Experience: Are users (customers) complaining of poor service or response times?
  • I/O Metric Performance: Are there obvious signs of abnormal performance statistics?

Several decades ago, an overall I/O health check might have been a periodic (E.g. Weekly or longer) activity, whereas today it’s undoubtedly a Business As Usual (BAU) and 24*7 activity. Therefore a fully automated solution is required, built upon the tried and tested System z performance fundamentals, namely RMF or CMF. The ideal solution will perform analytics based data reduction, presenting the right information, at the right time, allowing for intelligent business based communication, both internally, to customers and end users from an SLA viewpoint, and externally, with IHV DASD suppliers, safeguarding optimal performance and TCO.

EADM (Easy Analyze DASD Mainframe) is a solution from Technical Storage that performs automated performance analysis of the z/OS I/O subsystem, delivering predictive analytics for better storage capacity planning and performance measurement. The Technical Storage EADM architects have in excess of 40 years IBM Mainframe experience, specializing in the I/O subsystem, and so it’s no surprise that EADM delivers expert and timely knowledge via an easy-to-use solution.

EADM is an easy-to-install and easy-to-use plug-and-play solution that has no proprietary considerations, requiring no additional System z resource (E.g. CPU, Memory, DASD, et al) requirements. Installed on Microsoft server platforms, EADM is easily virtualized via VMware, Hyper-V, et al, requiring no target database for performance data storage. EADM performs a daily health check of the entire System z disk subsystem. EADM works around the clock, delivering customized and automatic user friendly GUI type reports. For today’s System z technician, the open and IP architecture base of EADM allows for secure remote access via Mobile, Tablet or Laptop devices, as and when required.

Operations and performance teams are alerted as soon as performance variances occur, typically in minutes, assisting in the identification of underlying root problems, causing changes in system behaviour. Incorporating intelligent and meaningful I/O performance indicators, with drill-down and zoom-in ability, storage technicians can determine if the problem is temporary, permanent, local or global. By simplifying the data reduction process (E.g. RMF/CMF data from numerous LPAR/Sysplex environments), EADM safeguards that the internal technical team can efficiently manage their ever increasingly complex and large DASD environment, for intelligent and timely communications with internal business teams and external suppliers alike.

EADM simplifies the System z I/O subsystem capacity and performance management process, delivering expert reports and timely historical analysis, for example:

  • Automatic daily (24 Hour) analysis of Sysplex wide workload (On-Line TP & Batch) I/O response times
  • Systematic intelligent alerts of early performance variances with exact occurrence time indicators
  • Identification of I/O performance hot-spots with DASD volume and data set level granularity
  • Performance trending at DFSMS Storage Group, Subsystem LCU and DASD volume level
  • DR (E.g. PPRC) simulations to prevent data loss and forecast Data Centre failover scenarios
  • I/O subsystem WLM indicators to determine exactly what impacts performance objectives
  • Full FICON channels and zHPF analysis, incorporating typical I/O throughput indicators
  • HyperPAV and associated LCU indicators to easily balance volumes, optimizing PAV alias allocation
  • Performance monitoring and balancing via intelligent LCU, SSID and I/O analytics
  • DASD capacity usage via DCOLLECT data, comparing assigned vs. allocated vs. actual disk utilization
  • EADM supports entry-level several LPAR and complex multiple CPC/LPAR System z configurations

A well provisioned and performing System z I/O subsystem is of vital importance for safeguarding today’s ever increasing storage requirements of mission critical business applications. A poorly performing I/O subsystem will generate unnecessary and extra CPU overhead, with potential and tangible TCO impact, in conjunction with potential business impact. Although the advances of the System z server and underlying DASD I/O subsystem can compensate for many application code or data placement issues, the fundamental concepts of analysing and tuning the I/O subsystem remain.

Therefore the savvy and proactive System z customer will safeguard that they find a solution to deliver optimal DASD I/O performance. Without doubt, such an analysis could be performed by a highly-skilled individual, but today’s 21st Century world demands a hybrid of technical and commercial skills. Therefore a solution that incorporates the diagnostic knowledge of the most highly trained technician, performs intelligent analytics on a plethora of Sysplex wide performance data sources and presents the information required, is one that will deliver benefit each and every day. EADM is an example of such a solution, delivering demonstrable System z TCO optimization benefits, while safeguarding a short-term ROI, with simple deployment and resource utilization attributes.

z/OS Soft Capping: Balancing Cost & Performance

Historically each and every LPAR was assigned a Relative Weight value; where a more meaningful description would be the initial processing weight. This relative weight value is used to determine which LPAR gains access to resources, where multiple LPARs are competing for the same resource. Being unit-less is one minor challenge of the relative weight value, meaning that it has no explicit CPU capacity or resource value. Typically installations would use a simple multiple of ten metric, most likely 1000, and allocate weights accordingly (E.g. 600=60%, 300=30%, 10=10%, et al). Therefore during periods of resource contention, PR/SM would allocate resources to the requisite LPAR, based upon its relative weight.

Using relative weight to classify all LPARs as equal, at least from a generic class viewpoint, does have some considerations; primarily differentiating between Production and Non-Production workloads. Restricting a workload to its relative weight share of resources is known as Hard Capping. This setting is typically used to restrict Non-Production (E.g. Test) environments to their allocated resource and is also useful for cost control (E.g. Outsourcers), knowing that the LPAR will never consume more than its allocated relative weight allowance.

Hard Capping behaviour changes dependent on the use of the HiperDispatch setting. When HiperDispatch is not chosen, capping is performed at the Logical CP level, where the goal is for each logical CP to receive its relative CP share, based on the relative weight setting. When HiperDispatch is active, vertical as opposed to horizontal CPU management applies. So, a High categorization dictates capping at 100% of the logical CP, whereas a Medium or Low setting allows for resource sharing based on a relative weight per CP basis.

The Intelligent Resource Director (IRD) function provides more advanced relative weight management, automating management of CPU resources and a subset of I/O resources. Workload Manager (WLM) manages physical CPU resource across z/OS images within an LPAR cluster based on service class goals. IRD is implemented as a collaboration between the WLM function and the PR/SM Logical Partitioning (LPAR) hypervisor:

  • Logical CP Management: dynamically allocating logical processors (E.g. Vary On-Line/Off-Line)
  • Relative Weight Management: dynamically redistributing CPU resource as per LPAR weights
  • CHPID Management: dynamically assigning logical channel paths between eligible LPARs

IRD optimizes resource usage, enabling WLM to deliver workload goals.

The use of relative weight in association with Hard Capping and/or IRD/WLM granularity has become somewhat limited for most Mainframe installations with the advent of Sub-Capacity pricing (I.E. MLC via SCRT/R4HA). Primarily because there is no direct correlation to manage CPU resource at a meaningful level, namely the MSU (vis-à-vis CPU MIPS) metric.

Defined Capacity (DC) provides Sub-Capacity CEC pricing by allowing definition of LPAR capacity with a granularity of 1 MSU. In conjunction with the WLM function, the Defined Capacity of an LPAR dictates whether Soft Capping is invoked or not. At this juncture, we should consider how and when WLM measures CPU resource usage and if and when Soft Capping is activated and deactivated:

WLM is responsible for taking MSU utilization samples for each LPAR in 10-second intervals. Every 5 minutes, WLM documents the highest observed MSU sample value from the 10-second interval samples. This process always keeps track of the past 48 updates taken for each LPAR. When the 49th reading is taken, the 1st reading is deleted, and so on. These 48 values continually represent a total of 5 minutes * 48 readings = 240 minutes or the past 4 hours (I.E. R4HA). WLM stores the average of these 48 values in the WLM control block RCT.RCTLACS. Each time RMF (or BMC CMF equivalent) creates a Type 70 record, the SMF70LAC field represents the average of all 48 MSU values for the respective LPAR a particular Type 70 record represents. Hence, we have the “Rolling 4 Hour Average”. RMF gets the value populated in SMF70LAC from RCT.RCTLACS at the time the record is created.

SCRT also uses the Type 70 field SMF70WLA to ensure that the values recorded in SMF70LAC do not exceed the maximum available MSU capacity assigned to an LPAR. If this ever happens (due to Soft Capping or otherwise) SCRT uses the value in SMF70WLA instead of SMF70LAC. Values in SMF70WLA represent the total capacity available to the LPAR.

We should also consider the two possibilities for MLC software payment (I.E. SCRT) based upon MSU resource usage. Quite simply, the MSU value passed for SCRT invoice consideration is the R4HA or the Defined Capacity, whichever is the lowest. Put another way; if the R4HA exceeds Defined Capacity, Soft Capping applies to the LPAR.

The primary disadvantage of Soft Capping is that the Defined Capacity setting is somewhat static; it is manually defined once, maybe several times a day for workloads with distinct characteristics (E.g. On-Line, Batch, et al), but dynamic DC management based upon inter-related LPAR behaviour is at best, evolving. The primary considerations for Soft Capping are:

  • An LPAR can only be managed via Soft Capping or Hard Capping; not both
  • DC rules only applies to General Purpose CP’s (Hard Capping for Specialty Engines is allowed)
  • An LPAR must be defined with shared CP’s (dedicated CP’s not allowed)
  • All LPAR Sub-Capacity eligible products have the same MSU capacity (I.E. DC)

Soft Capping is relatively simple to implement and typically generates MLC software costs savings, with minimal impact.

Group Capacity Limit (GCL) provides an extension to the Defined Capacity (DC) Soft Capping function. GCL allows an MSU limit for total usage of all group LPARs, with a granularity of 1 MSU. The primary considerations for GCL are:

  • Works with DC LPAR capacity settings
  • Target share does not exceed DC
  • Works with IRD
  • Multiple CEC groups allowed; but an LPAR may only be defined to one group
    An LPAR must be defined with shared CP’s, with WAIT COMPLETION = NO specification

It is possible to combine IRD weight management with the GCL function. Based on installation policy, IRD can modify the relative weight setting to redistribute capacity resource within an LPAR cluster.

However, IRD weight management is suspended when GCL is in effect, because LPAR resource entitlement within a capacity group can be (I.E. Pre zxC12) derived from the current weight. Hence the LPAR might get allocated an unacceptable low weight setting, generating a low GCL entitlement.

GCL also allows for MSU to be shared between LPARs in a group, where one LPAR would be a donator and another would be a receiver. Therefore the customer classifies their LPARs accordingly and when a high-priority LPAR requires additional MSU resource, it will be allocated from a lower priority LPAR, if available. This provides a modicum of flexibility, but by definition, peak workloads are not predictable and typically require a significantly higher amount of MSU for a short time period. Typically this requirement will not be satisfied with the GCL function.

Soft Capping techniques, either at the individual (DC) or group (GCL) level deliver cost saving benefit, but a fine granularity of management is required to balance cost saving versus associated performance considerations. The primary challenges associated with Soft Capping are its interactions with workload characteristics and an inability to dynamically manage MSU allocation, in-line with the R4HA. Put another way, the R4HA is derived from 48*5 Minute samples, whereas DC and GCL settings are typically defined on an infrequent (E.g. Monthly or longer) basis.

As z/OS evolves, further in-built function is available to manage MSU capacity. zSeries Capacity Provisioning Manager (CPM) is designed to simplify the management of temporary capacity, defined capacity and group capacity. The scope of z/OS Capacity Provisioning is to address capacity requirements for relatively short term workload fluctuations for which On/Off Capacity on Demand or Soft Capping changes are applicable. CPM is not a replacement for the customer derived Capacity Management process. Capacity Provisioning should not be used for providing additional capacity to systems that have Hard Capping (initial capping or absolute capping) defined.

With the introduction of z/OS 2.1, CPM functionality incorporates Soft Capping support via the DC and GCL functions. CPM functions from a set of installation defined policies and parameters, where the CPM server receives three types of input:

  • Domain Configuration: defines the CPCs and z/OS systems to be managed
  • Policy: contains the information as to which work is eligible, for which conditions and during which timeframes and capacity increases for constrained workloads
  • Parameter: contains environment descriptors (E.g. UNIX Environment, Installation Options, et al)

From a customer viewpoint, policy definition allows them to define the provision of CPU resource:

  • Date & Time: When capacity provisioning is allowed
  • Workload: Which service class qualifies for provisioning?
  • CPU Resource: How much additional MSU capacity can be allocated?

CPM provides more function when compared with Defined Capacity and Group Capacity Limit Soft Capping techniques. Therefore allowing for time schedules to be defined, workloads to be categorized and MSU resource to be allocated in a dynamic and granular manner.

A modicum of complexity exists when considering the arguably most important factor for CPM policy definition, namely the Performance Index (PI):

  • Activation: PI of service class periods must exceed the activation threshold for a specified duration, before the work is considered as eligible.
  • Deactivation: PI of service class periods must fall below the deactivation threshold for a specified duration, before the work is considered as ineligible.
  • Null: If no workload condition is specified a scheduled activation/deactivation is performed; with full capacity as specified in the rule scope, unconditionally at the start and end times of the time condition.

For workload based provisioning it is a necessary condition that the current system Performance Index exceeds the specified customer policy PI metric. One must draw one’s own conclusions regarding PI criteria settings, but to date, they’re largely based on arguably complex mathematical formulae, which perhaps is not practicable, especially from a simple management viewpoint.

With the requisite hardware (I.E. zxC12+) and Operating System levels (I.E. z/OS 1.13+), CPM provides extra functionality for the customer to implement granular Soft Capping techniques to balance cost and performance. When compared with Defined Capacity and Group Capacity Limit techniques, CPM delivers increased granularity for managing capacity dynamically, based on customer derived policies, recognizing time slots, workloads and MSU resource increases accordingly.

From a big picture viewpoint, without doubt, we must recognize the fundamental role that WLM plays in Soft Capping. Quite simply, the 48*5 Minute MSU resource samples dictate whether a workload will be eligible for Soft Capping or not and from a cumulative viewpoint, these MSU samples dictate the R4HA metric. Based on this observation, efficient and functional Soft Capping must be workload based (I.E. WLM Service Class), be dynamic and operational on a 24*7 basis, because workload peaks are never predictable, while balancing MSU resource accordingly. Of course, simplicity of implementation and management, supplemented by meaningful reporting is mandatory.

Once again, observing the 48*5 Minute MSU resource samples from a R4HA viewpoint, if a workload was to increase MSU usage by an average of 50% for 1 Hour (I.E. 12 Samples), and decrease MSU usage by an average of 20% for 2.5 Hours (I.E. 30 Samples), from an average viewpoint, the R4HA has remained static. Therefore an optimum Soft Capping technique needs to recognize WLM service class requirements, reacting in a timely manner, increasing and decreasing MSU usage, to safeguard workload performance for Time Critical workloads, while optimizing SCRT MLC cost.

zDynaCap delivers automated capacity balancing within CPCs, Capacity Groups or Groups of LPARs. Central to zDynaCap are the predefined balancing policies. Within these balancing policies, users define their MSU ranges of Groups and LPARs and also the priorities of the associated LPAR Workload. zDynaCap continually monitors overall usage and compares this to the available capacity and the user defined MSU balancing policies. For example, should a high priority workload on one LPAR not get enough capacity, while a low priority workload on another within the group gets too much capacity, available MSU capacity is distributed according to customer derived balancing policies. Only if there is no leftover capacity to be rescheduled within the defined Group, and if the high or medium priority workload will be slowed down, will zDynaCap add MSU.

With zDynaCap Capacity Balancing, available MSU capacity is balanced within LPAR groups, safeguarding that during peak time the mission critical workload is processed as per business expectations (E.g. SLA/KPI) for the lowest possible MLC cost.

In conclusion, given the significance of IBM MLC software (E.g. z/OS, CICS, DB2, IMS, WebSphere MQ, et al) costs, arguably every Mainframe environment should deploy a capping technique for cost optimization. Hard Capping might work for some, but in all likelihood, Soft Capping is the primary choice for most Mainframe environments. For sure, IBM have delivered several Soft Capping techniques, with varying levels of function and granularity, namely Defined Capacity, Group Capacity Limit (GCL) and the zSeries Capacity Provisioning Manager (CPM). It was forever thus and the ISV community exists because they specialize, architect and deliver specialized solutions and zDynaCap is such a solution, recognizing the fundamental rules of IBM Mainframe Soft Capping, namely the underlying WLM and R4HA foundation.

Cloudy With A Chance Of Mainframe?

With the advent of Computer Generated Imagery (CGI) there is seemingly no end to the number of books, especially “children’s” books that can be encapsulated and delivered in animated movie format.  I’m always surprised and arguably never surprised by the messaging in these stories; supposedly written for the younger person, but invariably delivering a message of good morals, ethics and human qualities, typically finding creative solutions to a myriad of problems.  Of course, we’re all human, and typically as human beings, we’re responsible for the majority of our problems, either knowingly, or not.

Cloudy with a Chance of Meatballs is a book based on a town named Chewandswallow characterized by its strange daily meteorological pattern, providing townsfolk with all of their required daily meals by raining food.  Although the residents of the town enjoy a lifestyle devoid of any grocery shopping or cookery, the weather unexpectedly and inexplicably takes a turn for the worse, devastating the local community with destructive and uncontrollable storms of either unpleasant or dangerously oversized foods, resulting in unstoppable catastrophes for the townspeople.  Their lives endangered by the threats of the storms, they relocate to a different community of average meteorological patterns, safe from the hazards that once were presented by raining meals.  However, they are forced to learn how to obtain food the normal way.

So what?  Continuing with the creativity thought, the ethos of this story might be somewhat analogous to the sometimes polarized opinion between Distributed Systems and Mainframe computing.  So depending on your philosophical bent or which side-of-the-fence you sit, there is only one choice, even if this seemingly perfect and de facto world is generating significant challenges… 

Recently, z/OS 2.1 became Generally Available (GA) and most notably from my viewpoint was its continued and demonstrable ability to participate in cloud computing environments.  So is the IBM Mainframe ready for the cloud?  Wasn’t it always!

The fundamental ethos of the Mainframe environment is virtualization and was forever thus.  The Mainframe has always shared the basic IT architecture components, including CPU, Memory, Storage, Networking and other peripherals, originally in a physical single-image structure, but since the late 1990’s in a shared (SYSPLEX) complex of interconnected physical servers (CPCs).  So the Mainframe is and always has been ready for “Prime Time Cloud”!

z/OS V2.1 is a platform designed to dynamically respond and scale to workload change with enhancements to scalability and performance that cover operations, I/O, virtual storage constraint relief, memory management, and more.  These enhancements are suitable for organizations that would like to catalyse a journey to highly scalable virtualized solutions like cloud.

IBM delivers improved scalability and performance for outstanding throughput and service within existing Mainframe environments.  Smarter scalability can better prepare the user for growth and spikes in workloads while maintaining the qualities of service and balanced design that customers have come to expect of the IBM mainframe.

As customers consider all the components of downtime, the true costs can be surprising, which is why superior availability continues to remain a key factor in platform selection. With z/OS V2.1, IBM introduces new capabilities designed to improve upon the already legendary z/OS system availability.  The industry-leading resiliency and high availability of System z remain key reasons why organizations keep their most critical processing on System z.  With its attention to outage reduction, the availability of System z and z/OS is well recognized in the industry.  In z/OS V2.1, IBM continues enhancements that improve critical IT systems availability, helping achieve an even higher level of service for customers.

Some of the “cloud friendly” z/OS 2.1 benefits include:

  • Support for Shared Memory Communications-RDMA (SMC-R), for low latency, application transparent communications to help you move data quickly between z/OS images on the same CPC or between CPCs.
  • Flash Express support for certain coupling facility list structures, such as IBM WebSphere MQ for z/OS, V7 (5655-R36), in order to strengthen resiliency for enterprise messaging workload spikes.
  • For zEC12 or zBC12 systems, shared engine coupling facilities can be used in many production environments, for improved economics by offering a high level of performance without requiring the use of dedicated CF engines.
  • EXCP support for System z High-Performance FICON (zHPF) is designed to help improve I/O start rates and improve bandwidth for more workloads on existing hardware and fabric.
  • Usability and performance improvements for z/OS FICON Discovery and Auto Configuration (zDAC), including discovery of directly attached devices.
  • Serial Coupling Facility structure rebuild processing, designed to help improve performance and availability by rebuilding coupling facility structures more quickly and in priority order.
  • 100-way symmetric multiprocessing (SMP) support in a single LPAR on IBM zEC12 or zBC12 systems.  Support for an architectural limit of 4 TB of real memory per LPAR.
  • Support for 2 GB pages is provided on zEC12 and zBC12 systems.  This feature is designed to reduce memory management overhead and improve overall system performance by enabling middleware to use 2 GB pages.  These improvements are expected due to improved effective translation lookaside buffer (TLB) coverage and a reduction in the number of steps the system must perform to translate a 2 GB page virtual address.
  • Capacity Provisioning is designed to provide support for manual and policy-based management of Defined Capacity and Group Capacity.  This function broadens the range of automatic, policy-based responses available to help manage capacity shortage conditions when WLM cannot meet your workload policy goals.

There are numerous new and enhanced functions delivered with z/OS 2.1, too numerous to mention, but categorised as Quality Of Service, Availability, Networking, Security, Data Usability, Integrity, Systems Management, Application Development, Simplification & Usability, International Standards Compliance, et al.

So let’s not forget, this foundation and support for an IT infrastructure and its supporting eco (software) system is in one scalable, secure and “zero” downtime environment!

So maybe for us open-minded and enlightened generation of parents (oops, I forgot, Grandparents for us Dinosaur Mainframe folk!) that can now “access” children’s stories, even if it’s in the form of a CGI animated movie, maybe we can be dispassionate enough to consider all platforms, Distributed and Mainframe for our evolving business and associated IT requirements. 

So you decide, can it be Cloudy With A Chance Of Mainframe?  To overlook such an option, might be an oversight, just as overlooking the abundance of human stories, classified as children’s books or not…

IBM Mainframe: Workload License Charges (WLC) Pros & Cons

It is estimated that less than half of eligible IBM Mainframe customers deploy the VWLC pricing mechanism, which in theory, is the lowest cost IBM software pricing metric.  Why?  In the first instance, let’s review the terminology…

Workload License Charges (WLC) is a monthly software license pricing metric applicable to IBM System z servers running z/OS or z/TPF in z/Architecture (64-bit) mode.  The fundamental ethos of WLC is a “pay for what you use” mechanism, allowing a lower cost of incremental growth and the potential to manage software cost by managing associated workload utilization.

WLC charges are either VWLC (Variable) or FWLC (Flat).  Not all IBM Mainframe software products are classified as VWLC eligible, but the major software is, including z/OS, CICS, DB2, IMS and WebSphere MQ, where these products are the most expensive, per MSU.  What IBM consider to be legacy products, are classified as FWLC.  More recently a modification to the VWLC mechanism was announced, namely AWLC (Advanced), strictly aligned with the latest generation of zSeries servers, namely zEC12, z196 and z114.  For the smaller user, the EWLC (Entry) mechanism applies, where AEWLC would apply for the z114 server.  There is a granular cost structure based on MSU (CPU) capacity that applies to VWLC and associated pricing mechanisms:

Band MSU Range
Base 0-3 MSU
Level 0 4-45 MSU
Level 1 46-175 MSU
Level 2 176-315 MSU
Level 3 316-575 MSU
Level 4 576-875 MSU
Level 5 876-1315 MSU
Level 6 1316-1975 MSU
Level 7 1976+ MSU

Put simply, as the MSU band increases, the related cost per MSU decreases.

IBM Mainframe users can further implement cost control by specifying how much MSU resource they use by deploying Sub-Capacity and Soft Capping techniques.  Defined Capacity (DC) allows the sizing of an LPAR in MSU, and so said LPAR will not exceed this MSU amount.  Group Capacity Limit (GCL) extends the Defined Capacity principle for a single LPAR to a group of LPARs, and so allowing MSU resource to be shared accordingly.  A potential downside of GCL is that is one LPAR of the group can consume all available MSU due to a rogue transaction (E.g. loop).

Sub-Capacity software charges are based upon LPAR hardware utilization, where the product runs, measured in hourly intervals.  To smooth out isolated usage peaks, a Rolling 4-Hour Average (R4HA) is calculated for each LPAR combination, and so software charges are based on the Monthly R4HA peak of appropriate LPAR combinations (I.E. where the software product runs) and not based on individual product measurement.

Once a Defined Capacity LPAR is deployed, this informs WLM (Workload Manager) to monitor the R4HA utilization of that LPAR.  If the LPAR R4HA utilization is less than the Defined Capacity, nothing happens.  If the LPAR R4HA utilization exceeds the Defined Capacity, then WLM signals to PR/SM and requests that Soft Capping be initiated, constraining the LPAR workload to the Defined Capacity level.

If a user chooses a Sub-Capacity WLC pricing mechanism, they will be required by IBM to submit a monthly Sub-Capacity Reporting Tool (SCRT) report.  Monthly WLC invoices are based upon hourly utilization metrics of LPAR hardware utilization, where the software product executes.  The cumulative R4HA and bottom line WLC billing metric is calculated for each product and associated LPAR group and not based on individual product measurement.

Bottom Line: From a Soft Capping viewpoint, the customer only pays for WLC software based upon the Defined Capacity (DC) or Rolling 4-Hour Average (R4HA), whichever is the lowest.  So whether a customer uses Soft Capping or not, in all likelihood, there will be occasions when their workload R4HA is lower than their zSeries server MSU capacity.

So, at first glance, VWLC seems to provide a compelling pricing metric, based upon Sub-Capacity and a pay for what you use ethos, and so why wouldn’t an IBM Mainframe user deploy this pricing metric?

The IBM Planning for Sub-Capacity Pricing (SA22-7999-0n) manual states “For IBM System z10 BC and System z9 BC environments, and z890 servers, EWLC pricing is the default for z/OS systems, and Sub-Capacity pricing is always the best option.  For IBM zEnterprise 114, environments, AEWLC pricing is the default for z/OS systems, and Sub-Capacity pricing is always the best option.  For IBM zEnterprise 196, System z10 EC and System z9 EC environments, and other zSeries servers, Sub-Capacity pricing is cost-effective for many, but not all, customers.  You might even find that Sub-Capacity pricing is cost effective for some of your CPCs, but not others (although if you want pricing aggregation, you must always use the same pricing for all the CPCs in the same sysplex)”.

Conclusion: For all small Mainframe users qualifying for the EWLC (AEWLC) pricing metric, arguably this pricing mechanism is mandatory.  For the majority of larger Mainframe users, the same applies, although a granularity of adoption might be required.  IBM also have a disclaimer “Once you decide to use Sub-Capacity pricing for a specific operating system family, you cannot return to the alternative pricing methods for that operating system family on that CPC.  For example, once you select WLC you may not switch back to PSLC without prior IBM approval”.  However, the requisite contractual exit clause option does exist; the customer can switch back to the PSLC pricing metric.

Some IBM Mainframe users might object to a notion of Soft Capping, relying upon their tried and tested methodology of LPAR management via the number of CPs allocated and associated PR/SM Weight.  This is seemingly a valid notion and requirement, prioritizing performance ahead of cost optimization.

Conclusion: As previously indicated, with VWLC, SCRT invoices are generated upon a premise of the customer only pays for WLC software based upon the Defined Capacity (DC) or Rolling 4-Hour Average (R4HA), whichever is the lowest.  So the VWLC pricing mechanism should deliver a granularity of cost savings, typically higher for a Soft Capping environment.

Some IBM Mainframe users might just believe that nothing can match their Parallel Sysplex Licensing Charge (PSLC) mechanism, first available in the late 1990’s, which might be attributable to other 3rd party ISV’s who cannot and will not allow for their software to be priced on a Sub-Capacity basis.  In reality, adopting the VWLC pricing mechanism delivers ~5% cost savings when compared with PSLC, as indicated by the IBM Planning for Sub-Capacity Pricing Manual (SA22-7999-0n) and related Sub-Capacity Planning Tool (SCPT).

Conclusion: Adopting Sub-Capacity based pricing metrics can only be a good thing.  If your 3rd party ISV supplier doesn’t recognise Sub-Capacity pricing, whether MIPS or MSU based, perhaps you should consider your relationship with them.  Regardless, the z10 server was the last IBM Mainframe to incorporate the “Technology Dividend” solely based on faster CPU chips.  The lower cost WLC pricing metric is now only available with the AWLC and related (E.g. AEWLC) pricing metrics, as per the z196, z114 and zEC12 servers.

Some customers might state that there is a lack of function or granularity of policy definition for IBM supplied Soft Capping (E.g. DC, GCL) or Workload Management (WLM) techniques.  To some extent this is a valid argument, but wasn’t it forever thus with IBM function?  Sub-Capacity implementation is possible via IBM, as is Workload Management (WLM), Soft Capping or not, but should the customer require extra functionality, 3rd party software solutions are available.

The zDynaCap software solution from zIT Consulting delivers a “Capacity Balancing” mechanism, integrating with R4HA and WLM methodologies, but constantly monitoring MSU usage to determine whether CPU resource can be reallocated to Mission & Time Critical workloads, based upon granular customer policies.  The only guarantee in a multiple LPAR environment, for a Mission & Time Critical LPAR to receive all available MSU resource, Soft Capping or not, is to inactivate all other LPARs!  Clearly this is not an acceptable policy for any installation, and so a best endeavours policy applies for PR/SM DC, GCL and Weight settings.

Conclusion: z/OS workloads change constantly, whether the time of day (E.g. On-Line, Batch) or period of the year (E.g. Weekly, Monthly, Quarterly, Yearly) or just by customer demand (E.g. 24 Hour Transaction Application).  Therefore a dynamic MSU management solution such as zDynaCap is arguably mandatory, implementing the optimum MSU management policy, whether for purely performance reasons, safeguarding the Mission & Time Critical workload isn’t impacted by lower priority workloads, or for cost reasons, optimizing MSU usage for the best possible monthly WLC cost.

In conclusion, not considering and arguably not implementing z/OS VWLC related pricing mechanisms is impractical, because:

  • The VWLC and AWLC related pricing metrics deliver the lowest cost per MSU for eligible z/OS software
  • When compared with PSLC, VWLC related pricing mechanisms deliver conservative ~5% cost savings
  • A pay for what you use and therefore Sub-Capacity pricing mechanism, not the installed MSU capacity
  • If extra MSU policy management granularity is required, consider 3rd party software such as zDynaCap

Software cost savings are not just for the privileged; they’re for everyone!