zHyperLink: Just Another System z DASD I/O Function Enhancement?

Over the last several decades or so the IBM Mainframe platform has delivered several new technologies that have dramatically improved the performance of disk (DASD) I/O performance.  Specifically the deployment of ESCON as the introduction to Fibre Optical channels, followed by EMIF for channel sharing and reduced I/O protocol, superseded by FICON and most recently zHPF.  All of these technologies have allowed for ever larger amounts of data to be processed by the System z server and the adoption of Geographically Dispersed Parallel Sysplex (GDPS) implementations for business continuity reasons.  Ultimately mission critical data and decisions are facilitated by applications and sub-second response times for these transactions is expected.  Some might say that we’re always running to stand still from a performance perspective when implementing the latest System z technologies?

In reality, today’s 21st Century mission-critical application is not just capturing and storing customer data, it’s doing so much more, attempting to make informed business decisions for a richer customer experience!  Historically a customer transaction would be on a one-to-one basis (E.g. ask for a balance query), whereas today, said transaction might generate more data for the customer, potentially offering them a new or enhanced product.  In theory, this informed and intelligent transaction processing delivers a richer experience for the customer and potentially new revenue opportunities for the business.

For several years IBM have integrated the Cloud, Analytics, Mobile, Social & Security (CAMSS) initiative into their product offerings, recognising that a business transaction can originate from the cloud or a mobile device, potentially via a Social Media platform, require rich processing via real-time analytics, while requiring the highest levels of security.  Of course, one must draw one’s own conclusions, but maintaining sub-second or ultra-fast transaction response times, with this level of CAMSS complexity requires significant performance enhancements.  To deliver such ultra-fast response times requires the DASD I/O subsystem to maintain the highest levels of performance, aligned with the latest System z server platform…

In January 2017 IBM issued a Statement of Direction (SoD) and associated FAQ for their zHyperLink technology.  zHyperLink is a new short distance mainframe link technology designed for up to 10 times lower latency than zHPF.  zHyperLink is intended to accelerate DB2 for z/OS transaction processing and improve active log throughput.  IBM intends to deliver field upgradable support for zHyperLink on the existing IBM DS8880 storage subsystem.  zHyperLink technology is a new mainframe attach link.  It is the result of collaboration between DB2 for z/OS, the z/OS operating System, IBM System z servers and the DS8880 storage subsystem to deliver extreme low latency I/O access for DB2 for z/OS applications.  zHyperLink technology is intended to complement FICON technology, accelerating those I/O requests that are typically used for transaction processing.  These links are point-to-point connections between the System z CEC and the storage system and are limited to 150 meter distances.  These links do not impact the z Architecture 8 channel path limit.

From a DB2 I/O service performance perspective viewpoint, at short distances, a native FICON or zHPF originated I/O typically requires 300 Microseconds (μs) for a simple I/O operation.  The coupling facility for z Systems typically can read or write 4K of data in in under 8 Microseconds.  zHyperLink technology will provide a new short distance link from the mainframe to storage to read and write data up to 10 times faster than FICON or zHPF; reducing DB2 I/O service times to an anticipated 20-30 Microseconds.

In conclusion, with a promise of 10 times faster processing, as per its fibre optic channel technology predecessors, particularly EMIF and zHPF, zHyperLink is a revolutionary DASD I/O function and not just another DASD I/O subsystem function enhancement.  At this stage, the deployment of zHyperLink functionality is restricted to DB2 and the IBM DS8880 storage subsystem, while we eagerly await compatibility support from EMC and HDS accordingly.  Moreover, as per the evolution of zHPF, we hope for the inclusion of other I/O workloads to benefit from this paradigm changing I/O response time technology.

Finally, as always, the realm of possibility always exists for each and every System z DASD I/O subsystem to be monitored and tuned on a proactive and 24*7*365 basis.  Although all of this DASD I/O performance data has always been and still is captured by RMF (CMF) data, intelligent processing of this data requires an ever evolving Performance Management process and arguably an intelligent software solution (E.g. IntelliMagic Direction Disk Magic or Technical Storage Easy Analyze Disk Mainframe) to provide meaningful information and business decisions from ever increasing amounts of RMF (CMF) data.  In November 2016 ago I delivered the DASD I/O Performance Management Is Easy? session at the UK GSE Annual 2016 meeting accordingly…

All Flash & Substance – Is The System z Microsecond The New Millisecond?

Is 2016 the year of the All Flash disk array?  Seemingly from a System z perspective, 2016 has seen improvement in the All Flash disk array offerings from the major disk suppliers, namely EMC, HDS and IBM.  From a usability perspective, managing latency might be the overall challenge, where these ultra-fast SSD systems are delivering I/O performance response times measured in the ~250-500 Microsecond (μs) range, potentially consigning the traditional Millisecond (ms) measurement to history!

Whatever the speeds and feeds might be, as of 2016, the benchmark for a System z All Flash Disk Array is seemingly measured @ <500 Microsecond μs response time, supporting ~n PB of capacity and delivering ~nnn GB/S throughput for mixed read/write workloads.  Of course, strong encryption, typically full disk Data @ Rest Encryption (D@RE) based and full seamless data replication interoperability are also mandatory.

Historically we evolved from Data Processing to Information Technology, not only automating the capture and processing of data, but gradually evolving our processes, using this data for business advantage.  In recent years, the information explosion has dictated that each and every business must be a cognitive business, using intelligent analytics to gain insight and faster decision-making from the business data collected.

Currently the Internet of Things (IoT) supplements the medium-term Cloud, Analytics, Mobile, Social & Security (CAMSS) initiative, being the processes and associated solutions required by cognitive businesses to make timely and informed decisions, capturing deeper customer insight, ultimately delivering competitive advantage.  Therefore the 21st century business generates a significant requirement for storage capacity and performance to fully realize the benefits of this truly business aligned cognitive approach.

The largest global organizations from all verticals leverage from the power and true 24*7*365 availability and reliability of the System z Mainframe to power enormous relational databases, processing millions of customer transactions on an hourly basis.  These always-on, mission critical business environments demand the performance, reliability, TCO and System z platform integration delivered by the associated DASD (3390) subsystem.

Each and every System z user will have their IHV of choice for delivering disk storage, in alphabetical order, EMC (E.g. VMAX AFA/All Flash Array), HDS (HAF/Hitachi Accelerated Flash) or IBM (E.g. DS8888).  The choice of disk storage was forever thus, reviewing the market place and choosing the best option for your business.  What might require reflection is how the DASD I/O subsystem is managed and the associated interaction with said IHV supplier.  Systems Management solutions such as Easy Disk Analyze Mainframe (EADM) and IntelliMagic Vision (Disk Magic) will certainly simplify the analysis and presentation of DASD subsystem performance data.

However the emphasis of the actual System z DASD subsystem for an All Flash array might move from being an internal consideration, to a direct and timely communication with the IHV supplier.  Put very simply, in an environment where Mission Critical systems rely upon ultra fast processing of massive amounts of data, any flash memory issues, whether capacity or defect related will need IHV interaction ASAP, arguably “Before The Event”.  As with the System z Server itself, where we’re used to On/Off Capacity on Demand (OOCoD) processes, maybe we need to consider a similar approach with our All Flash System z DASD arrays.  For the avoidance of doubt, as opposed to waiting for an issue to impact our business, maybe we need to work smarter with our IHV, to safeguard that sufficient flash memory is available, to proactively resolve capacity or defect issues…

Aligning this with our traditional 3390 DASD I/O subsystem analysis, which might have been on a daily basis, from the rich RMF/CMF data resource, we must fully automate this process to minimize or eliminate the Mean Time To Resolution (MTTR)!  The ultimate benefit will be the delivery of meaningful messages that incorporate our 3rd party IHV supplier, who potentially with Remote Support Facility (RSF) type processes, deploys the “Golden Screwdriver” to seamlessly safeguard the performance profiles of our Mission Critical business applications, leveraging from the latest All Flash disk array.

In conclusion, as always, technology can deliver business benefits, with substance, and this includes All Flash disk arrays.  As always, what might need to evolve are the associated Systems Management processes.  Therefore asking yet another potential rhetorical question, what is more important, the System z Server or timely data access?  The diplomatic answer is that they’re equally important and if so, let’s safeguard the availability of All Flash memory for our DASD subsystems, with the requisite levels of meaningful proactive reporting and IHV supplier interaction.

System z: Optimizing DASD I/O Subsystem Performance

Historically there was a very simple synergy between the IBM S/370 Mainframe and its supporting disk I/O (DASD) subsystem, allowing for Mainframe host to physical and logical disk device (I.E. 3390) connectivity. The analysis and tuning of this I/O subsystem has always been and continues to be supported by the SMF Type 7n records via IBM RMF and the BMC CMF alternative. However, over the years, major advances in DASD subsystems and the System z Mainframe server have delivered many layers of technology resources (E.g. Cache, Memory, FICON Channels, RAID Storage, Proprietary Microcode, et al) and this has introduced complexities into highlighting DASD I/O subsystem performance problems.

The focus of technology based metrics (E.g. I/O Rate Response Time, I/O MB/S Bandwidth, et al) have also been complemented with more meaningful business focussed Service Level Agreements (SLA). Therefore today’s System z I/O Performance Analyst must gather and act upon proactive meaningful information from the ever-increasing amounts of performance data available. Put another way, too much data can deliver not enough information! As previously stated, it was forever thus, RMF and CMF have always collected the requisite performance data available and arguably no other data source is required (E.g. OMEGAMON/TMON/SYSVIEW Performance Monitor, SAS/MXG/MICS/WPS Performance Database). RMF/CMF is the ideal data source for thorough and timely System z I/O performance management, where intelligent analytics and expert knowledge are required to present this “Golden Record”.

However, today’s System z Support Teams need simple and timely presentation of the data, highlighting potential challenges, graphically presented for their Management, allowing for simple tracking of SLA agreements and technology changes (I.E. Software/Hardware Upgrades).

Additionally, Workload Manager (WLM) can control non-paging queued DASD I/O requests, based upon device busy conditional processing. Therefore the z/OS system can manage I/O priorities in a Sysplex, based on WLM service class goals. WLM dynamically adjusts the I/O priority based on service class goal performance and whether a DASD device can influence the overall performance objectives. For obvious reasons, this WLM function does not micro-manage I/O priorities, only changing a service class period’s I/O priority infrequently. WLM is deployed by many System z users to assist in the automated management of system resources (E.g. CPU, Memory, I/O, et al), based upon Service Level goals.

From a DASD subsystem technology viewpoint, there is no longer an obvious one-one direct connection between the Mainframe host and DASD device. An increasing number of technological advances, both microcode and hardware (E.g. Memory, Fibre Channel, Function Assist Processing, et al) have diminished the requirement for data access directly from the physical device. Put another way, in today’s world of System z servers with multiple cache level CPU chips (I.E. Relative Nest Intensity), massive and multiple processor memory resources (I.E. z13 @ 10 TB Memory), high bandwidth Fibre Channel (I.E. FICON, zHPF) subsystem and a hierarchy of DASD memory (I.E. SSD/Flash, Cache), it’s not uncommon to consider an I/O that requires physical device access as a problem! Finally and most importantly, from a DASD subsystem viewpoint, each of the recognized System z DASD providers, EMC (Symmetrix VMAX), HDS (VSP G1000) and IBM (DS8870) have highly proprietary DASD subsystems that provide z/OS plug compatibility, but deliver overall I/O performance using their own unique architecture and internal algorithms.

Of course, an over configured hardware environment will deliver a poor TCO, while an under configured environment will manifest in SLA issues and bad user experiences, where the middle-ground always delivers the optimal environment. Resource optimization always demands proactive day-to-day management, from an internal and indeed external communication viewpoint. With the highly proprietary design features of the IHV DASD subsystems, whether EMC, HDS or IBM, having the right information and identifying the precise problem, simplifies the communication process with the IHV. Such communication might highlight a resource under provision (E.g. Memory Capacity), a subsystem setting tweak requirement, either host or subsystem based, or indeed a hardware failure. In today’s world, these issues need to be fixed in minutes or hours, not days or weeks.

Therefore, where does today’s System z I/O Performance Analyst start to collect the required information to safeguard that their DASD subsystem is optimized, both from a capacity and performance viewpoint?

A simplistic viewpoint of an I/O health-check should consider the following:

  • Service Level Agreements (SLA): Are overall objectives being delivered or missed?
  • User Experience: Are users (customers) complaining of poor service or response times?
  • I/O Metric Performance: Are there obvious signs of abnormal performance statistics?

Several decades ago, an overall I/O health check might have been a periodic (E.g. Weekly or longer) activity, whereas today it’s undoubtedly a Business As Usual (BAU) and 24*7 activity. Therefore a fully automated solution is required, built upon the tried and tested System z performance fundamentals, namely RMF or CMF. The ideal solution will perform analytics based data reduction, presenting the right information, at the right time, allowing for intelligent business based communication, both internally, to customers and end users from an SLA viewpoint, and externally, with IHV DASD suppliers, safeguarding optimal performance and TCO.

EADM (Easy Analyze DASD Mainframe) is a solution from Technical Storage that performs automated performance analysis of the z/OS I/O subsystem, delivering predictive analytics for better storage capacity planning and performance measurement. The Technical Storage EADM architects have in excess of 40 years IBM Mainframe experience, specializing in the I/O subsystem, and so it’s no surprise that EADM delivers expert and timely knowledge via an easy-to-use solution.

EADM is an easy-to-install and easy-to-use plug-and-play solution that has no proprietary considerations, requiring no additional System z resource (E.g. CPU, Memory, DASD, et al) requirements. Installed on Microsoft server platforms, EADM is easily virtualized via VMware, Hyper-V, et al, requiring no target database for performance data storage. EADM performs a daily health check of the entire System z disk subsystem. EADM works around the clock, delivering customized and automatic user friendly GUI type reports. For today’s System z technician, the open and IP architecture base of EADM allows for secure remote access via Mobile, Tablet or Laptop devices, as and when required.

Operations and performance teams are alerted as soon as performance variances occur, typically in minutes, assisting in the identification of underlying root problems, causing changes in system behaviour. Incorporating intelligent and meaningful I/O performance indicators, with drill-down and zoom-in ability, storage technicians can determine if the problem is temporary, permanent, local or global. By simplifying the data reduction process (E.g. RMF/CMF data from numerous LPAR/Sysplex environments), EADM safeguards that the internal technical team can efficiently manage their ever increasingly complex and large DASD environment, for intelligent and timely communications with internal business teams and external suppliers alike.

EADM simplifies the System z I/O subsystem capacity and performance management process, delivering expert reports and timely historical analysis, for example:

  • Automatic daily (24 Hour) analysis of Sysplex wide workload (On-Line TP & Batch) I/O response times
  • Systematic intelligent alerts of early performance variances with exact occurrence time indicators
  • Identification of I/O performance hot-spots with DASD volume and data set level granularity
  • Performance trending at DFSMS Storage Group, Subsystem LCU and DASD volume level
  • DR (E.g. PPRC) simulations to prevent data loss and forecast Data Centre failover scenarios
  • I/O subsystem WLM indicators to determine exactly what impacts performance objectives
  • Full FICON channels and zHPF analysis, incorporating typical I/O throughput indicators
  • HyperPAV and associated LCU indicators to easily balance volumes, optimizing PAV alias allocation
  • Performance monitoring and balancing via intelligent LCU, SSID and I/O analytics
  • DASD capacity usage via DCOLLECT data, comparing assigned vs. allocated vs. actual disk utilization
  • EADM supports entry-level several LPAR and complex multiple CPC/LPAR System z configurations

A well provisioned and performing System z I/O subsystem is of vital importance for safeguarding today’s ever increasing storage requirements of mission critical business applications. A poorly performing I/O subsystem will generate unnecessary and extra CPU overhead, with potential and tangible TCO impact, in conjunction with potential business impact. Although the advances of the System z server and underlying DASD I/O subsystem can compensate for many application code or data placement issues, the fundamental concepts of analysing and tuning the I/O subsystem remain.

Therefore the savvy and proactive System z customer will safeguard that they find a solution to deliver optimal DASD I/O performance. Without doubt, such an analysis could be performed by a highly-skilled individual, but today’s 21st Century world demands a hybrid of technical and commercial skills. Therefore a solution that incorporates the diagnostic knowledge of the most highly trained technician, performs intelligent analytics on a plethora of Sysplex wide performance data sources and presents the information required, is one that will deliver benefit each and every day. EADM is an example of such a solution, delivering demonstrable System z TCO optimization benefits, while safeguarding a short-term ROI, with simple deployment and resource utilization attributes.

Mainframe Server Planning: Vendor Interaction

In the last few weeks I have encountered a couple of scenarios regarding Mainframe Server upgrades that have surprised me somewhat.  The first was at the annual UK GSE conference during November 2013, where one of the largest UK Mainframe customers stated “we had problems regarding the capacity sizing of the IBM Mainframe server installed and our vendor was not very helpful in resolving this challenge with us”.  The second was a European customer with 2 aging servers deployed, z9 BC, and they had asked their IBM Mainframe server vendor to provide an upgrade quotation.  The server vendor duly replied, providing a like-for-like upgrade quotation, 2 new zBC12 servers, which at first glance seemed to be a valid configuration.

The one thing in common for these 2 vastly different Mainframe customers, the first very large, the second quite small, is that inadvertently they didn’t necessarily engage their respective vendors with the best set of questions or indeed terms of reference; while the vendors might say “ask me no questions and I’ll tell you no lies”…

For the 2nd scenario, I was asked to quickly review the configuration provided.  My first observation was to consolidate both workloads on 1 server.  The customer confirmed, there was no business reason to have 2 servers, it was historic, and there wasn’t even a SYSPLEX between the 2 z9 BC servers.  The historic reason for the 2 z9 BC servers was the number of General Purpose (GP) engines supported.  My second observation was that software licensing could be simplified and optimized with aggregated MSU and use of the AEWLC pricing model.  So within ~1 hour, the customer had a significant potential to dramatically reduce costs.

We then suggested an analysis of their configuration with 2 software products, PerfTechPro for z/OS and zDynaCap.  They already had the SMF data, so using the simulation abilities of these products, the customer quickly confirmed they could consolidate their workloads onto 1 zBC12, deploy zIIP processors to offload ~15% CPU usage from GP, and control MSU allocation with zDynaCap, saving another ~10% of CPU.  For this customer, a small investment in software products reduced their server upgrade costs by ~€400,000 in year 1, with similar software savings, each and every year forever more.  Although they didn’t have the skills in-house from a Mainframe Capacity Planning and software licensing viewpoint, this customer did eventually ask the right questions, and the rest as they say is history!

No man or indeed Mainframe customer is an island, so don’t be afraid to ask questions of your vendors or business partners!

From a cost viewpoint, both long-term (TCO) and day 1 (TCA), the requirement to deploy the optimum Mainframe server configuration from a capacity viewpoint cannot be under estimated, both in terms of hardware costs, but more importantly, associated software costs.  It therefore follows that Mainframe Capacity Planning and Mainframe Software Licensing knowledge is imperative, but I’m not so sure there are that many Mainframe customers that have clearly defined job roles for such disciplines.

To generalize, always a dangerous thing, typically the larger Mainframe customer does have skilled and seasoned personnel for the Capacity Planning discipline, while the smaller Mainframe user might rely on a generic Systems Programmer or maybe even rely on their vendor to size their Mainframe servers.  From a Mainframe software licensing viewpoint, there seems to be no general rule-of-thumb, as sometimes the smaller customer has significant knowledge and experience, whereas the larger Mainframe customer might not.  Bottom Line: If the Mainframe customer doesn’t allocate the optimum capacity and associated software licensing metrics for their installation, problems will arise, probably for several years or more!

Are there any simple solutions or processes that can assist Mainframe customers?

The first and most simple observation is to engage your vendor and safeguard that they generate the final Mainframe server configuration that is used for Purchase Order activities.  For sure, the customer will have their capacity plan and perhaps a “draft” server configuration, but even in these instances, the vendor should QA this data, refining the bill of materials (E.g. Hardware) accordingly.  Therefore an iterative process occurs between customer and vendor, but the vendor is the one that confirms the agreed configuration is fit for purpose.  In the unlikely event there are challenges in the future, the customer can work with their vendor to find a solution, as opposed to the example stated above where the vendor left their customer somewhat isolated.

The second observation is leverage from the tools and processes that are available, both generally available and internal for vendor pre sales personnel.  Seemingly everybody likes something for nothing and so the ability to deploy “free” tools will appeal to most.

For Mainframe Capacity Planning, in addition to the standard in-house processes, whether bespoke (E.g. SAS, MXG, MICS based) or a packaged product, there are other additional tools available, primarily from IBM:

zPCR (Processor Capacity Reference) is a generally available Windows PC based tool, designed to provide capacity planning insight for IBM System z processors running various z/OS, z/VM, z/VSE, Linux, zAware, and CFCC workload environments on partitioned hardware.  Capacity results are based on IBM’s most recently published LSPR data for z/OS.  Capacity is presented relative to a user-selected Reference-CPU, which may be assigned any capacity scaling-factor and metric.

zCP3000 (Performance Analysis and Capacity Planning) is an IBM internal tool, Windows PC based, designed to for performance analysis and capacity planning simulations for IBM System z processors, running various SCP and workload environments.  It can also be used to graphically analyse logically partitioned processors and DASD configurations.  Input normally comes from the customer’s system logs via a separate tool (I.E. z/OS SMF via CP2KEXTR, VM Monitor via CP3KVMXT, VSE CPUMON via VSE2EDF).

zPSG (Processor Selection Guide) is an IBM internal tool, Windows PC based, designed to provide sizing approximations for IBM System z processors intended to host a new application, implemented using popular, commercially available software products (E.g. WebSphere, DB2, ODM, Linux Apache Server).

zSoftCap (Software Migration Capacity Planning Aid) is a generally available Windows PC based tool, designed to assess the effect on IBM System z processor capacity, when planning to upgrade to a more current operating system version and/or major subsystems versions (E.g. Batch, CICS, DB2, IMS, Web and System).  zSoftCap assumes that the hardware configuration remains constant while the software version or release changes.  The capacity implication of an upgrade for the software components can be assessed independently or in any combination.

zBNA (System z Batch Network Analysis) is a generally available Windows PC based tool, designed to understand the batch window, for example:

  • Perform “what if” analysis and estimate the CPU upgrade effect on batch window
  • Identify job time sequences based on a graphical view
  • Filter jobs by attributes like CPU time / intensity, job class, service class, et al
  • Review the resource consumption of all the batch jobs
  • Drill down to the individual steps to see the resource usage
  • Identify candidate jobs for running on different processors
  • Identify jobs with speed of engine concerns (top tasks %)

BWATOOL (Batch Workload Analysis Tool) is an IBM internal tool, Windows PC based, designed to analyse SMF type 30 and 70 data, producing a report showing how long batch jobs run on the currently installed processor.  Both CPU time and elapsed time are reported. Similar results can then be projected for any IBM System z processor model. Basic questions that can be answered by BWATOOL include:

  • What jobs are good candidates for running on any given processor?
  • How much would jobs benefit from running on a faster processor?
  • For jobs within a critical path (batch window), what overall change in elapsed time might occur with a new processor?

zMCAT (Migration Capacity Analysis Tool) is an IBM internal tool, Windows PC based, designed to compare the performance of production workloads before and after migration of the system image to a new processor, even if the number of engines on the processor has changed.  Workloads for which performance is to be analysed must be carefully chosen because the power comparison may vary considerably due to differing use of system services, I/O rate, instruction mix, storage reference patterns, et al.  This is why customer experiences are unique from an internal throughput ratio (ITRR) based on LSPR benchmark data.

zTPM (Tivoli Performance Modeler) is an IBM internal tool, Windows PC based designed to let you build a model of a z/OS based IBM System z processor, and then run various “what if scenarios”.  zTPM uses simulation techniques to let you model the impact of changes on individual workload performance.  zTPM uses RMF or CMF reports as input.  Based on these reports, zTPM can create summary charts showing LPAR as well as workload utilization.  An automated Build function lets you build a model that represents the system for any reporting interval.  Once the model is built, you can make changes to see the impact on workload performance.  zTPM is also available as an IBM software product offering.

Therefore there are numerous tools available from IBM to assist their customers determine optimum Mainframe server capacity requirements.  Some of these tools are generally available without engaging the IBM account team, but others are internal to IBM, and for that reason alone, Mainframe customers must engage their IBM Mainframe account team to participate in their capacity planning activities.  Additionally, as the only supplier of Mainframe Servers, IBM have a wealth of knowledge and indeed a responsibility and generally a willingness to assist their customers deploy the right Mainframe server configuration from day 1.

As a customer, don’t be afraid to engage external 3rd parties to perform a sanity check of your thinking and activities, clearly IBM as they will be fulfilling your IBM Mainframe server order.  However, consider engaging other capacity/performance and software licensing specialists as their experience incorporates many customers, as opposed to an insular view.  Moreover, such 3rd parties probably utilize their own software tools or products to assist in this most important of disciplines.

In conclusion, as always, the worst question is the one not asked, and for this most fundamental of processes, not collaborating with your vendor and the wider community, might leave you as an individual exposed and isolated, and your company exposed to the consequences of an undersized or oversized Mainframe sever configuration…

21st Century Mainframe Capacity Planning Requirements

With nearly 5 decades of longevity the IBM Mainframe has changed beyond recognition in terms of CPU capacity and performance capability.  The Capacity Planning discipline for the IBM Mainframe server became more advanced and proactive in the early 1990’s, perhaps coinciding with the introduction of Parallel Sysplex structures associated with the MVS/ESA operating system.  Therefore the requirement to measure and model the impact of workload movement between LPAR and CPC structures became important, if not mandatory.

The fundamental building-block for Mainframe CPU usage analysis is SMF Type 7n records (I.E. RMF or CMF), where this data was typically processed by MXG, MICS and maybe CIMS (acquired by IBM), generally using SAS for reporting purposes.  Other tools, including but not limited to, BEST/1 (acquired by BMC) and PERFMAN (acquired by ASG) also offered capacity planning and performance management solutions.  Therefore, for 20+ years the fundamental Mainframe CPU usage data and associated tools have remained largely the same.  However, maybe the IBM Mainframe server has changed, both in terms of underlying CPU chip technology and customer workload deployment…

I often hear capacity planners state something along the lines of “I can report on the past with 100% accuracy, but predicting the future might prove to be a little more difficult”!  Once again, going back to the early 1990’s, the IBM Mainframe had a typical if not generic workload profile deployment, namely On-Line Transaction Processing (E.g. CICS, IMS DC) and related Database Management Subsystems (E.g. DB2, IMS DB) with Batch Processing.  This somewhat limited workload profile simplified the Capacity Planning process, applying estimates of growth based on current usage.  However, when the Mainframe became more pervasive, taking on new workloads, how was the capacity planner supposed to estimate CPU requirements for their new business application workload?

IBM introduced the Large Systems Performance Reference (LSPR) methodology, designed to provide relative processor capacity data for IBM System/370, System/390 and z/Architecture processors.  All LSPR data is based on a set of measured benchmarks and analysis, covering a variety of System Control Program (SCP) and workload environments.  LSPR data is intended to be used to estimate the capacity expectation for a production workload when considering a move to a new processor.  Although LSPR data is provided on an “as is” basis, with no warranty, it at least provides the Mainframe Capacity Planner with some insight into their CPU sizing challenge.  For many years, LSPR provided the only other data source, as well as RMF (CMF) for Mainframe CPU sizing.  However, is there a more accurate data source, perhaps based on real-life customer data?

With the introduction of the IBM System z10 server (February 2008), a new function CPU MF (CPU Measurement Facility) was incorporated.  Let’s not forget, z10 is now an n-2 technology, having been superseded by the z196/z114 and the latest zBC12/zEC12 generation of servers.  So each and every committed Mainframe customer should be positioned to benefit from the CPU MF function.

CPU MF provides optional hardware assisted collections of information about logical CPU activity executed over a specified interval in selected Logical Partitions (LPARs).  The CPU MF counters function is intended to be run on a constant basis to collect long-term performance data (I.E. SMF Record 113), in a similar manner to how you collect other performance data.  Therefore this data source can be deployed to further refine the accuracy of Mainframe CPU capacity planning projections.  Let’s not forget:

The primary on-going requirement for Mainframe Capacity Planning is to minimize any over or under capacity provision from forecast predictions, used for Mainframe server acquisition purposes”

Mainframe chip technology has also changed in complexity, especially with the latest iterations of CPU chips associated with the z10 server (E.g. POWER 6) onwards, incorporating many layers of cache memory.  Workload capacity performance will be quite sensitive to how deep into the memory hierarchy the processor must go to retrieve the workload’s instructions and data for execution.  Best performance occurs when the instructions and data are found in the cache(s) nearest the processor so that little time is spent waiting prior to execution; as instructions and data must be retrieved from farther out in the hierarchy, the processor spends more time waiting for their arrival.

As workloads are moved between processors with different memory hierarchy designs, performance will vary as the average time to retrieve instructions and data from within the memory hierarchy will vary.  Additionally, once on a processor this component will continue to vary significantly as the location of a workload’s instructions and data within the memory hierarchy is affected by many factors including; locality of reference, IO rate, competition from other resources (E.g. Applications, LPARs, et al), and so on…

The most performance sensitive area of the memory hierarchy is the activity to the memory nest, namely, the distribution of activity to the shared caches and memory.  IBM introduced new terminology, namely Relative Nest Intensity (RNI), indicating the level of activity to this part of the memory hierarchy.  Using data from CPU MF, the RNI of the workload running in an LPAR may be calculated.  The higher the RNI, the deeper into the memory hierarchy the processor must go to retrieve the instructions and data for that workload.

Therefore the Mainframe Capacity Planner does have various data sources available to forecast how an existing or new workload might perform on an upgraded processor (CPC), further refining their CPU capacity requirement forecast.  As always, the final stage in a Mainframe Capacity Planning process is to input the forecast data into the IBM Processor Capacity Reference (zPCR) tool, to determine the exact model and associated resource configuration options for their unique business workload mix.

To summarize, does your Mainframe Capacity Planning process incorporate all of these CPU sizing data sources, in an easy-to-use and cost efficient manner?

Founded by former IBM staffers and capacity planning and performance management industry veterans William Shelden, PhD, and William Hart, PerfTechPro is designed to deliver sophisticated, affordable, easy-to-use solutions for IT management professionals looking for fast, insightful help without high-cost, complex and time-consuming purchasing and licensing requirements.

PerfTechPro for z/OS is a Capacity Planning and Performance Measurement tool specifically designed for the cost conscious and savvy 21st Century data centre.  PerfTechPro for z/OS is the next evolution in Mainframe Capacity Planning tools, having been architected from ground zero using the latest techniques.  PerfTechPro for z/OS provides sophisticated capacity and performance management capabilities, affordable by any sized data centre:

  • Clean, intuitive, easy-to-use interface and graphical representations, for example:
    • Consolidated instance lists guide users to make informed selections
    • Descriptive dialog boxes detail your configuration
    • Anticipates, pre-loads data to speed retrieval, reporting and analysis
    • Automated data management
  • Forecasting and modelling
  • Non-proprietary database, enabling data use outside of PerfTechPro
  • Capable of automated collection, analysis and reporting of SMF 113 records produced by the IBM CPU Measurement Facility (CPU MF)
  • Supports measurement, management of zAAP & zIIP Specialty Engines
  • Automated analysis and management of all key capacity and performance metrics, for example:
    • GPP Utilization of All LPARs
    • MIPS Usage by CPU
    • DASD Response Times
    • Address Spaces Dispatched and Waiting 

PerfTechPro for z/OS also simplifies the data management process associated with Mainframe Capacity Planning.  Using a streamlined process on the z/OS host, PerfTechPro extracts and formats the data required from various SMF sources (E.g. SMF Type 7n, Type 113); delivering an optimized Performance Data Base (PDB) for use by the Windows based GUI.  This optimized file safeguards fast processing during the reporting and forecasting activities, while simplifying any data aggregation processes (E.g. Weekly, Monthly, et al).  Moreover, PerfTechPro allows this data to be stored in non-proprietary (E.g. Microsoft Access, SQL Server, MySQL, Oracle) and multiple database structures, as and if required.

PerfTechPro for z/OS is a simple-to-use and cost-efficient solution, allowing customers to quickly save time and money from their Capacity Planning and Performance Measurement solution.  Ultimately the bottom line objective for PerfTechPro for z/OS is to provide a best-of-breed solution for a very competitive cost. PerfTechPro for z/OS delivers business value by:

  • Ensuring enterprise zSeries Mainframe server resources are being used efficiently
  • Maximizing opportunities for cost-savings
  • Anticipating & responding to increased demand on resources
  • Reducing costs by exploiting periods of lower resource demand
  • Discerning underlying causes of performance and capacity issues
  • Eliminating time-consuming manual tracking, recording and analysis
  • Implementing disciplined management of valuable business resources

In conclusion, the Mainframe Capacity Planning process continues to evolve, forever striving to reduce any discrepancies in CPU requirements forecasting, which of course, have a high associated cost consideration.  Integrating CPU MF (SMF Type 113) must be a mandatory requirement, safeguarding that CPU Sizing, Forecasting, Modelling and Correlation Analysis activities are optimized.  Additionally, the actual process of Mainframe Capacity Planning is an activity that requires great skill and considerable associated responsibility.  A modern day solution such as PerfTechPro for z/OS is worthy of consideration, having been designed by a team with a heritage in delivering Mainframe Capacity Planning solutions, architecting function compatible with modern day functionality, while considering the latest technology zSeries CPU chip design considerations.