zHyperLink: Just Another System z DASD I/O Function Enhancement?

Over the last several decades or so the IBM Mainframe platform has delivered several new technologies that have dramatically improved the performance of disk (DASD) I/O performance.  Specifically the deployment of ESCON as the introduction to Fibre Optical channels, followed by EMIF for channel sharing and reduced I/O protocol, superseded by FICON and most recently zHPF.  All of these technologies have allowed for ever larger amounts of data to be processed by the System z server and the adoption of Geographically Dispersed Parallel Sysplex (GDPS) implementations for business continuity reasons.  Ultimately mission critical data and decisions are facilitated by applications and sub-second response times for these transactions is expected.  Some might say that we’re always running to stand still from a performance perspective when implementing the latest System z technologies?

In reality, today’s 21st Century mission-critical application is not just capturing and storing customer data, it’s doing so much more, attempting to make informed business decisions for a richer customer experience!  Historically a customer transaction would be on a one-to-one basis (E.g. ask for a balance query), whereas today, said transaction might generate more data for the customer, potentially offering them a new or enhanced product.  In theory, this informed and intelligent transaction processing delivers a richer experience for the customer and potentially new revenue opportunities for the business.

For several years IBM have integrated the Cloud, Analytics, Mobile, Social & Security (CAMSS) initiative into their product offerings, recognising that a business transaction can originate from the cloud or a mobile device, potentially via a Social Media platform, require rich processing via real-time analytics, while requiring the highest levels of security.  Of course, one must draw one’s own conclusions, but maintaining sub-second or ultra-fast transaction response times, with this level of CAMSS complexity requires significant performance enhancements.  To deliver such ultra-fast response times requires the DASD I/O subsystem to maintain the highest levels of performance, aligned with the latest System z server platform…

In January 2017 IBM issued a Statement of Direction (SoD) and associated FAQ for their zHyperLink technology.  zHyperLink is a new short distance mainframe link technology designed for up to 10 times lower latency than zHPF.  zHyperLink is intended to accelerate DB2 for z/OS transaction processing and improve active log throughput.  IBM intends to deliver field upgradable support for zHyperLink on the existing IBM DS8880 storage subsystem.  zHyperLink technology is a new mainframe attach link.  It is the result of collaboration between DB2 for z/OS, the z/OS operating System, IBM System z servers and the DS8880 storage subsystem to deliver extreme low latency I/O access for DB2 for z/OS applications.  zHyperLink technology is intended to complement FICON technology, accelerating those I/O requests that are typically used for transaction processing.  These links are point-to-point connections between the System z CEC and the storage system and are limited to 150 meter distances.  These links do not impact the z Architecture 8 channel path limit.

From a DB2 I/O service performance perspective viewpoint, at short distances, a native FICON or zHPF originated I/O typically requires 300 Microseconds (μs) for a simple I/O operation.  The coupling facility for z Systems typically can read or write 4K of data in in under 8 Microseconds.  zHyperLink technology will provide a new short distance link from the mainframe to storage to read and write data up to 10 times faster than FICON or zHPF; reducing DB2 I/O service times to an anticipated 20-30 Microseconds.

In conclusion, with a promise of 10 times faster processing, as per its fibre optic channel technology predecessors, particularly EMIF and zHPF, zHyperLink is a revolutionary DASD I/O function and not just another DASD I/O subsystem function enhancement.  At this stage, the deployment of zHyperLink functionality is restricted to DB2 and the IBM DS8880 storage subsystem, while we eagerly await compatibility support from EMC and HDS accordingly.  Moreover, as per the evolution of zHPF, we hope for the inclusion of other I/O workloads to benefit from this paradigm changing I/O response time technology.

Finally, as always, the realm of possibility always exists for each and every System z DASD I/O subsystem to be monitored and tuned on a proactive and 24*7*365 basis.  Although all of this DASD I/O performance data has always been and still is captured by RMF (CMF) data, intelligent processing of this data requires an ever evolving Performance Management process and arguably an intelligent software solution (E.g. IntelliMagic Direction Disk Magic or Technical Storage Easy Analyze Disk Mainframe) to provide meaningful information and business decisions from ever increasing amounts of RMF (CMF) data.  In November 2016 ago I delivered the DASD I/O Performance Management Is Easy? session at the UK GSE Annual 2016 meeting accordingly…

System z: Optimizing DASD I/O Subsystem Performance

Historically there was a very simple synergy between the IBM S/370 Mainframe and its supporting disk I/O (DASD) subsystem, allowing for Mainframe host to physical and logical disk device (I.E. 3390) connectivity. The analysis and tuning of this I/O subsystem has always been and continues to be supported by the SMF Type 7n records via IBM RMF and the BMC CMF alternative. However, over the years, major advances in DASD subsystems and the System z Mainframe server have delivered many layers of technology resources (E.g. Cache, Memory, FICON Channels, RAID Storage, Proprietary Microcode, et al) and this has introduced complexities into highlighting DASD I/O subsystem performance problems.

The focus of technology based metrics (E.g. I/O Rate Response Time, I/O MB/S Bandwidth, et al) have also been complemented with more meaningful business focussed Service Level Agreements (SLA). Therefore today’s System z I/O Performance Analyst must gather and act upon proactive meaningful information from the ever-increasing amounts of performance data available. Put another way, too much data can deliver not enough information! As previously stated, it was forever thus, RMF and CMF have always collected the requisite performance data available and arguably no other data source is required (E.g. OMEGAMON/TMON/SYSVIEW Performance Monitor, SAS/MXG/MICS/WPS Performance Database). RMF/CMF is the ideal data source for thorough and timely System z I/O performance management, where intelligent analytics and expert knowledge are required to present this “Golden Record”.

However, today’s System z Support Teams need simple and timely presentation of the data, highlighting potential challenges, graphically presented for their Management, allowing for simple tracking of SLA agreements and technology changes (I.E. Software/Hardware Upgrades).

Additionally, Workload Manager (WLM) can control non-paging queued DASD I/O requests, based upon device busy conditional processing. Therefore the z/OS system can manage I/O priorities in a Sysplex, based on WLM service class goals. WLM dynamically adjusts the I/O priority based on service class goal performance and whether a DASD device can influence the overall performance objectives. For obvious reasons, this WLM function does not micro-manage I/O priorities, only changing a service class period’s I/O priority infrequently. WLM is deployed by many System z users to assist in the automated management of system resources (E.g. CPU, Memory, I/O, et al), based upon Service Level goals.

From a DASD subsystem technology viewpoint, there is no longer an obvious one-one direct connection between the Mainframe host and DASD device. An increasing number of technological advances, both microcode and hardware (E.g. Memory, Fibre Channel, Function Assist Processing, et al) have diminished the requirement for data access directly from the physical device. Put another way, in today’s world of System z servers with multiple cache level CPU chips (I.E. Relative Nest Intensity), massive and multiple processor memory resources (I.E. z13 @ 10 TB Memory), high bandwidth Fibre Channel (I.E. FICON, zHPF) subsystem and a hierarchy of DASD memory (I.E. SSD/Flash, Cache), it’s not uncommon to consider an I/O that requires physical device access as a problem! Finally and most importantly, from a DASD subsystem viewpoint, each of the recognized System z DASD providers, EMC (Symmetrix VMAX), HDS (VSP G1000) and IBM (DS8870) have highly proprietary DASD subsystems that provide z/OS plug compatibility, but deliver overall I/O performance using their own unique architecture and internal algorithms.

Of course, an over configured hardware environment will deliver a poor TCO, while an under configured environment will manifest in SLA issues and bad user experiences, where the middle-ground always delivers the optimal environment. Resource optimization always demands proactive day-to-day management, from an internal and indeed external communication viewpoint. With the highly proprietary design features of the IHV DASD subsystems, whether EMC, HDS or IBM, having the right information and identifying the precise problem, simplifies the communication process with the IHV. Such communication might highlight a resource under provision (E.g. Memory Capacity), a subsystem setting tweak requirement, either host or subsystem based, or indeed a hardware failure. In today’s world, these issues need to be fixed in minutes or hours, not days or weeks.

Therefore, where does today’s System z I/O Performance Analyst start to collect the required information to safeguard that their DASD subsystem is optimized, both from a capacity and performance viewpoint?

A simplistic viewpoint of an I/O health-check should consider the following:

  • Service Level Agreements (SLA): Are overall objectives being delivered or missed?
  • User Experience: Are users (customers) complaining of poor service or response times?
  • I/O Metric Performance: Are there obvious signs of abnormal performance statistics?

Several decades ago, an overall I/O health check might have been a periodic (E.g. Weekly or longer) activity, whereas today it’s undoubtedly a Business As Usual (BAU) and 24*7 activity. Therefore a fully automated solution is required, built upon the tried and tested System z performance fundamentals, namely RMF or CMF. The ideal solution will perform analytics based data reduction, presenting the right information, at the right time, allowing for intelligent business based communication, both internally, to customers and end users from an SLA viewpoint, and externally, with IHV DASD suppliers, safeguarding optimal performance and TCO.

EADM (Easy Analyze DASD Mainframe) is a solution from Technical Storage that performs automated performance analysis of the z/OS I/O subsystem, delivering predictive analytics for better storage capacity planning and performance measurement. The Technical Storage EADM architects have in excess of 40 years IBM Mainframe experience, specializing in the I/O subsystem, and so it’s no surprise that EADM delivers expert and timely knowledge via an easy-to-use solution.

EADM is an easy-to-install and easy-to-use plug-and-play solution that has no proprietary considerations, requiring no additional System z resource (E.g. CPU, Memory, DASD, et al) requirements. Installed on Microsoft server platforms, EADM is easily virtualized via VMware, Hyper-V, et al, requiring no target database for performance data storage. EADM performs a daily health check of the entire System z disk subsystem. EADM works around the clock, delivering customized and automatic user friendly GUI type reports. For today’s System z technician, the open and IP architecture base of EADM allows for secure remote access via Mobile, Tablet or Laptop devices, as and when required.

Operations and performance teams are alerted as soon as performance variances occur, typically in minutes, assisting in the identification of underlying root problems, causing changes in system behaviour. Incorporating intelligent and meaningful I/O performance indicators, with drill-down and zoom-in ability, storage technicians can determine if the problem is temporary, permanent, local or global. By simplifying the data reduction process (E.g. RMF/CMF data from numerous LPAR/Sysplex environments), EADM safeguards that the internal technical team can efficiently manage their ever increasingly complex and large DASD environment, for intelligent and timely communications with internal business teams and external suppliers alike.

EADM simplifies the System z I/O subsystem capacity and performance management process, delivering expert reports and timely historical analysis, for example:

  • Automatic daily (24 Hour) analysis of Sysplex wide workload (On-Line TP & Batch) I/O response times
  • Systematic intelligent alerts of early performance variances with exact occurrence time indicators
  • Identification of I/O performance hot-spots with DASD volume and data set level granularity
  • Performance trending at DFSMS Storage Group, Subsystem LCU and DASD volume level
  • DR (E.g. PPRC) simulations to prevent data loss and forecast Data Centre failover scenarios
  • I/O subsystem WLM indicators to determine exactly what impacts performance objectives
  • Full FICON channels and zHPF analysis, incorporating typical I/O throughput indicators
  • HyperPAV and associated LCU indicators to easily balance volumes, optimizing PAV alias allocation
  • Performance monitoring and balancing via intelligent LCU, SSID and I/O analytics
  • DASD capacity usage via DCOLLECT data, comparing assigned vs. allocated vs. actual disk utilization
  • EADM supports entry-level several LPAR and complex multiple CPC/LPAR System z configurations

A well provisioned and performing System z I/O subsystem is of vital importance for safeguarding today’s ever increasing storage requirements of mission critical business applications. A poorly performing I/O subsystem will generate unnecessary and extra CPU overhead, with potential and tangible TCO impact, in conjunction with potential business impact. Although the advances of the System z server and underlying DASD I/O subsystem can compensate for many application code or data placement issues, the fundamental concepts of analysing and tuning the I/O subsystem remain.

Therefore the savvy and proactive System z customer will safeguard that they find a solution to deliver optimal DASD I/O performance. Without doubt, such an analysis could be performed by a highly-skilled individual, but today’s 21st Century world demands a hybrid of technical and commercial skills. Therefore a solution that incorporates the diagnostic knowledge of the most highly trained technician, performs intelligent analytics on a plethora of Sysplex wide performance data sources and presents the information required, is one that will deliver benefit each and every day. EADM is an example of such a solution, delivering demonstrable System z TCO optimization benefits, while safeguarding a short-term ROI, with simple deployment and resource utilization attributes.

Cloudy With A Chance Of Mainframe?

With the advent of Computer Generated Imagery (CGI) there is seemingly no end to the number of books, especially “children’s” books that can be encapsulated and delivered in animated movie format.  I’m always surprised and arguably never surprised by the messaging in these stories; supposedly written for the younger person, but invariably delivering a message of good morals, ethics and human qualities, typically finding creative solutions to a myriad of problems.  Of course, we’re all human, and typically as human beings, we’re responsible for the majority of our problems, either knowingly, or not.

Cloudy with a Chance of Meatballs is a book based on a town named Chewandswallow characterized by its strange daily meteorological pattern, providing townsfolk with all of their required daily meals by raining food.  Although the residents of the town enjoy a lifestyle devoid of any grocery shopping or cookery, the weather unexpectedly and inexplicably takes a turn for the worse, devastating the local community with destructive and uncontrollable storms of either unpleasant or dangerously oversized foods, resulting in unstoppable catastrophes for the townspeople.  Their lives endangered by the threats of the storms, they relocate to a different community of average meteorological patterns, safe from the hazards that once were presented by raining meals.  However, they are forced to learn how to obtain food the normal way.

So what?  Continuing with the creativity thought, the ethos of this story might be somewhat analogous to the sometimes polarized opinion between Distributed Systems and Mainframe computing.  So depending on your philosophical bent or which side-of-the-fence you sit, there is only one choice, even if this seemingly perfect and de facto world is generating significant challenges… 

Recently, z/OS 2.1 became Generally Available (GA) and most notably from my viewpoint was its continued and demonstrable ability to participate in cloud computing environments.  So is the IBM Mainframe ready for the cloud?  Wasn’t it always!

The fundamental ethos of the Mainframe environment is virtualization and was forever thus.  The Mainframe has always shared the basic IT architecture components, including CPU, Memory, Storage, Networking and other peripherals, originally in a physical single-image structure, but since the late 1990’s in a shared (SYSPLEX) complex of interconnected physical servers (CPCs).  So the Mainframe is and always has been ready for “Prime Time Cloud”!

z/OS V2.1 is a platform designed to dynamically respond and scale to workload change with enhancements to scalability and performance that cover operations, I/O, virtual storage constraint relief, memory management, and more.  These enhancements are suitable for organizations that would like to catalyse a journey to highly scalable virtualized solutions like cloud.

IBM delivers improved scalability and performance for outstanding throughput and service within existing Mainframe environments.  Smarter scalability can better prepare the user for growth and spikes in workloads while maintaining the qualities of service and balanced design that customers have come to expect of the IBM mainframe.

As customers consider all the components of downtime, the true costs can be surprising, which is why superior availability continues to remain a key factor in platform selection. With z/OS V2.1, IBM introduces new capabilities designed to improve upon the already legendary z/OS system availability.  The industry-leading resiliency and high availability of System z remain key reasons why organizations keep their most critical processing on System z.  With its attention to outage reduction, the availability of System z and z/OS is well recognized in the industry.  In z/OS V2.1, IBM continues enhancements that improve critical IT systems availability, helping achieve an even higher level of service for customers.

Some of the “cloud friendly” z/OS 2.1 benefits include:

  • Support for Shared Memory Communications-RDMA (SMC-R), for low latency, application transparent communications to help you move data quickly between z/OS images on the same CPC or between CPCs.
  • Flash Express support for certain coupling facility list structures, such as IBM WebSphere MQ for z/OS, V7 (5655-R36), in order to strengthen resiliency for enterprise messaging workload spikes.
  • For zEC12 or zBC12 systems, shared engine coupling facilities can be used in many production environments, for improved economics by offering a high level of performance without requiring the use of dedicated CF engines.
  • EXCP support for System z High-Performance FICON (zHPF) is designed to help improve I/O start rates and improve bandwidth for more workloads on existing hardware and fabric.
  • Usability and performance improvements for z/OS FICON Discovery and Auto Configuration (zDAC), including discovery of directly attached devices.
  • Serial Coupling Facility structure rebuild processing, designed to help improve performance and availability by rebuilding coupling facility structures more quickly and in priority order.
  • 100-way symmetric multiprocessing (SMP) support in a single LPAR on IBM zEC12 or zBC12 systems.  Support for an architectural limit of 4 TB of real memory per LPAR.
  • Support for 2 GB pages is provided on zEC12 and zBC12 systems.  This feature is designed to reduce memory management overhead and improve overall system performance by enabling middleware to use 2 GB pages.  These improvements are expected due to improved effective translation lookaside buffer (TLB) coverage and a reduction in the number of steps the system must perform to translate a 2 GB page virtual address.
  • Capacity Provisioning is designed to provide support for manual and policy-based management of Defined Capacity and Group Capacity.  This function broadens the range of automatic, policy-based responses available to help manage capacity shortage conditions when WLM cannot meet your workload policy goals.

There are numerous new and enhanced functions delivered with z/OS 2.1, too numerous to mention, but categorised as Quality Of Service, Availability, Networking, Security, Data Usability, Integrity, Systems Management, Application Development, Simplification & Usability, International Standards Compliance, et al.

So let’s not forget, this foundation and support for an IT infrastructure and its supporting eco (software) system is in one scalable, secure and “zero” downtime environment!

So maybe for us open-minded and enlightened generation of parents (oops, I forgot, Grandparents for us Dinosaur Mainframe folk!) that can now “access” children’s stories, even if it’s in the form of a CGI animated movie, maybe we can be dispassionate enough to consider all platforms, Distributed and Mainframe for our evolving business and associated IT requirements. 

So you decide, can it be Cloudy With A Chance Of Mainframe?  To overlook such an option, might be an oversight, just as overlooking the abundance of human stories, classified as children’s books or not…

Mainframe ISV Software: Is Continuous Product Improvement Always Evident?

Ken Venturi once said “I don’t believe you have to be better than everybody else.  I believe you have to be better than you ever thought you could be”.

Wouldn’t it be great if every CTO and/or Product manager had this same philosophy for their Mainframe software solution?  One such example I have experienced over the years is (E)JES from Phoenix Software International (PSI).  Of course it’s really important to have Day 1 support for the latest release of Operating System, z/OS 2.1 being the latest example, but what about actually exploiting the latest functionality available with the latest zSeries Mainframe Enterprise Servers and z/OS Operating Systems?

To drive maximum bang from you’re your buck, optimal performance and robust cost optimization can only be possible by recognizing and exploiting the latest Mainframe function ASAP, as and when appropriate.  Furthermore, listening to your customers, analysing their feedback, actively participating in User Organizations such as SHARE, and so on, will all help in continuous product development and innovation.

Here are some of the reasons why (E)JES has succeeded over a 30+ year period, recognizing and exploiting new z/OS function, as and when the updated z/OS is released for General Availability (GA).  Even today, with Version 5.3 supporting z/OS 2.1 as of day 1, (E)JES continues to offer value-added function for the seasoned, inexperienced and in fact, all IBM Mainframe technicians:

  • 64-bit performance optimizations (I.E. MEMLIMIT: above-the-bar) for both (E)JES client and server components, safeguarding minimal z/OS resource usage.
  • Nearly all (E)JES JES subsystem processing routines are eligible for zIIP redirection, delivering software cost savings for all (E)JES users.  Sub-Capacity System z processor users experience improved (E)JES performance because zIIP engines always run at full speed.  This behaviour differs from that of General Purpose CPs, “throttled” with Sub-Capacity deployments.
  • (E)JES code executes faster via its inbuilt High Performance Routine (HPR) facility, specifically developed to make (E)JES code execute faster while accessing data in JES control blocks.  HPRs have a shorter instruction path length than previous coding techniques, avoiding delays in modern z Series CPU instruction pipelines.
  • If High Performance FICON (zHPF) is available, (E)JES uses Transport Mode channel programs for JES Spool I/O.  When zHPF is not available, or when a CAS server performs I/O against the global data set, (E)JES uses the highest-performing Command Mode channel programs currently available.  These channel programs perform I/O significantly faster than “ordinary” channel programs.
  • The use of 24-bit (captured) UCBs puts a strain on the 24-bit virtual storage resource.  The use of ordinary (non-extended) TIOT entries puts a limit on the total number of allocations that can exist simultaneously in an address space.  (E)JES supports and uses 31-bit (uncaptured) UCBs and the extended TIOT (XTIOT) function (I.E. NON_VSAM_XTIOT=YES in DEVSUPxx PARMLIB)
  • (E)JES supports placement of JES spool data sets in the cylinder-managed area of an Extended Address Volume (EAV).  Of course, as of z/OS 1.12, EAV increases 3390 DASD capacity to ~1 TB.
  • (E)JES Pattern Utility Matching uses the SRST hardware instruction.  Empirical measurements show this technique is far faster on modern System z processors than alternatives such as the TRT instruction or “brute force” matching techniques using CLI/CLC.

One of the primary benefits of upgrading IBM z/OS software is the overall system performance benefit and associated cost reduction, but of course, IBM can only deliver the function and ability, while it’s incumbent upon the ISV community to upgrade their software products accordingly.  A key goal for any good ISV software product is to try to provide a value-add in the area of performance.  This has been one of the primary areas of focus for (E)JES since its introduction in 1978. 

Most spool display and management products tend to rely on the most resource-intensive interface available, namely the JES subsystem provided SSI 80.  (E)JES benchmarking tests against the most readily-available JES SSI 80 exploiters demonstrates significant CPU savings when deploying (E)JES.

Software products also need to deliver continuous improvements with regard to usability, presentation and in-built function, increasing user and system administrator productivity.  Without doubt, optimization encompasses not just hardware, but software, services, systems management disciplines and “best practices” that tie it all together.  Here are some of the usability enhancements that (E)JES has incorporated:

  • ISPF users running a 3270 emulator on a programmable workstation can now search IBM Eclipse-based InfoCenters via (E)JES.  Although (E)JES fully supports BookManager format documentation, BookManager READ/MVS is now obsolete, beginning with z/OS 2.1, BookManager softcopy books are no longer delivered by IBM.  IBM has stated that InfoCenters, and eventually KnowledgeCenters, are their strategic direction for online documentation.
  • (E)JES Web is a new, browser-based interface to (E)JES.  The associated RESTful API delivering this web enabled technology provides a framework for the creation of Eclipse plug-ins, mobile applications, and other web services clients.  This facility will provide a “rapid learning” type facility for users (E)JES users, both new and old that might be uncomfortable navigating traditional 3270 interfaces.
  • (E)JES provides a Java Application Programming Interface (API), complementing other in-built APIs for REXX and procedural languages.  By using an (E)JES API, a user can harness the versatility of their preferred programming language to interface and interact with (E)JES.  This support provides an interface to deliver nearly all of the capabilities available to an interactive (E)JES user.
  • (E)JES incorporates context sensitive help function, with point-and-shoot/pop-up dialogs, helping educate users on (E)JES, JES and z/OS while they work.  Users can get pop-up explanations of columns, input choices for unprotected fields, and a list of line commands.  Smart pop-ups explain the contents of certain columns, such as system abend codes.

The latest (E)JES Release Information Manual eloquently details the product enhancements over the last 5 releases or so, providing a good Product Roadmap reference point.

So, whether the ISV software product you deploy has been available for several years or several decades, do you safeguard maximum business benefit for optimal cost by considering:

  • Does the ISV deploy the latest zSeries server (I.E. zBC12, zEC12) for software interoperability and full hardware function exploitation; or an emulation (I.E. zPDT) technique?
  • Does the ISV deliver value-added z/OS related function on Day 1 or even within a year of the latest z/OS release?
  • Does the ISV deliver meaningful function to assist your users deploy said function, while simplifying environment management for system administrators?
  • Does your ISV product optimize cost, with Sub-Capacity pricing in MSU increments, aggregated MSU costs for your entire zSeries Mainframe environment, as opposed to specific workloads (E.g. CPC’s, LPAR’s, et al)?
  • Does your ISV product optimize cost by offloading the majority of its CPU function to zIIP specialty engines, which run at maximum speed, and where software “runs for free”?

Of course, only you can ask and potentially answer these questions during your day-to-day activities of maintaining currency and optimal performance for your Mainframe software portfolio.

Sometimes the hardest questions anybody can ask are the questions they ask themselves, which are never rhetorical questions!  Extracted verbatim from the latest (E)JES Release Information Manual:

Team (E)JES took advantage of the Phoenix Software International zHISR performance analysis product to discover performance “hot spots” in  the (E)JES product.  Sometimes the simplest, least conspicuous piece of code turns out to be a major CPU contributor.  See below for some of the most embarrassing “surprise” hot spots we discovered using zHISR in a z/OS 2.1 LPAR:

  • Over 30% of the CPU used during a Spool Data Browse FIND operation, against a multi-million-line SYSOUT in JES2, turned out to be code that was clearing a record buffer to blanks using MVCL.  This clearing code was eliminated and some minor adjustments were made in other code to compensate for this change.
  • 27% of the CPU used to produce the Activity display in JES2 turned out to be in a routine that manages an internal resource called the “Job Positions Table.”  The algorithm was improved (to work more like its JES3 counterpart) and that routine is no longer a significant CPU contributor.
  • 9% of (E)JES session start-up was a 26-year-old “brute force” prime number generator used to compute the size of a hash table.  That code was totally reworked and now accounts for approximately .02% of session start-up CPU.
  • A 6% performance penalty was observed when sorting a tabular display with a moderate number of rows. The hot spot turned out to be the code that cleared the work area for the sort service to zeros (another MVCL). This overhead was reduced to .04%.

Mea culpa and humility, never a bad thing, but you have to be honest with yourself and ask yourself the right questions!  So going back full circle and quoting Ken Venturi once again, “I don’t believe you have to be better than everybody else.  I believe you have to be better than you ever thought you could be”.  You must draw your own conclusions as to whether such an observation applies to the (E)JES team at Phoenix Software International (PSI)…

Why not ask them yourself?  Ed Jaffe, the (E)JES CTO will be available at the forthcoming UK GSE Annual Conference, 5-6 November 2013, speaking about (E)JES System Management Software: More With Less For Less, For The z/OS Mainframe and z/OS 2.1 User Experiences.

FICON (Fibre Connection Channel): 15 Years of Mainframe I/O Improvements

In 1998, IBM introduced FICON channels for enhanced I/O connectivity and performance for their 9672 G5 processors, delivering significant capability when compared to its predecessor, ESCON.  Let’s not forget that ESCON (Enterprise Systems – S/390) was the first iteration of Fibre Channel for the IBM Mainframe, delivering significant capability, when compared with the previous technology of heavy, large and costly copper based bus & tag parallel (S/370) channels.

ESCON channels were first introduced in the early 1990’s, but after less than a decade, the data and associated storage device explosion was exposing the technical capabilities of ESCON, for example:

  • Mainframe Server Channel Support: One IBM Mainframe processor could only support 256 ESCON channels, whereas FICON was offering a ~5-8:1 reduction in channel requirements.  Put another way, a customer could expect to consolidate the number of channels required from ~200 ESCON to ~30-40 FICON.
  • Device Support: One ESCON channel could support up to 1024 devices (sub-channel/device numbers), channel, whereas a 9672 FICON channel increased support by 16 fold, up to 16,384 (16 K) devices.
  • Distance: The performance of ESCON dropped off significantly when the distance between the channel and associated Control Unit was greater than ~9 KM.  FICON increased this distance separation to ~100 KM, paving the way for the Geographically Dispersed Parallel Sysplex (GDPS) topologies we take for granted today.
  • Performance: ESCON performance was limited to 17 MB/S, whereas the first evolution of FICON channels delivered 100 MB/S full-duplex performance.

Clearly the first iteration of FICON technology delivered significant benefit to the IBM Mainframe User, and arguably is the primary Mainframe evolution that has sustained data growth and the adoption of Disaster Recovery and Business Continuity resiliency.  So, what does FICON offer today, some 15 years later?

Just as FICON superseded ESCON, FICON Express has now superseded FICON, offering a technology base that can continue to deliver benefit for many years to come.  FICON Express continues the tradition of offering more capabilities with each new generation of FICON channel.  The features were designed with the future in mind, while remembering the past, by supporting the data serving leadership of System z and enabling improved data access using High Performance functions (I.E. zHPF), while providing backwards compatibility, being able to auto-negotiate the link data rates of 2, 4 or 8 Gbps, namely the various FICON Expressn iterations (2/4/8).

High Performance FICON for System z (zHPF) is a data transfer protocol that is optionally deployed for accessing data from IBM Mainframe compatible storage subsystems (E.g. IBM DS8000, EMC Symmetrix V-Max, HDS USP, et al) and other subsystems.  Initially the data types supported were DB2, PDSE, VSAM, zFS and Extended Format SAM, and more latterly, legacy access methods including QSAM, BPAM and BSAM are now supported.  zHPF leverages the potential of FICON channels to deliver significant performance enhancements, and can help reduce the infrastructure costs for System z I/O by efficiently utilizing I/O resources, minimizing CHPID (Channels), Fiber (Cables), Switch Ports (E.g. Cisco, Brocade) and Control Unit (E.g. Disk Subsystem) resource requirements.  zHPF also compliments the Extended Address Volumes (EAV) strategy for growth, increasing I/O rate capability as the associated disk volume size increases.

The latest generation FICON Express8S channel has two possible modes of operation designed for connectivity to servers, switches/directors, disks, tapes and printers:

  1. CHPID Type FC: FICON, zHPF, and channel-to-channel (CTC) traffic for the z/OS, z/VM, z/VSE, z/TPF, and Linux on System z environments
  2. CHPID Type FCP: Fibre Channel Protocol (FCP) for attachment to SCSI devices for the z/VM, z/VSE, and Linux on System z environments

With FCP channel full fabric support, multiple switches/directors can be placed between the System z server and SCSI device, allowing many “hops” through a storage area network (SAN) and providing improved utilization of intersite-connected resources and infrastructure.  This may help to provide more choices for storage solutions or the ability to use existing storage devices and can help facilitate the consolidation of Distributed Systems servers (E.g. UNIX, Wintel) onto System z servers, protecting investments in SCSI-based storage.

I/O performance improvement rates for the initial iterations of FICON when compared to ESCON and then FICON Express when compared to FICON, and more latterly zHPF have been significant.  Using like-for-like benchmark performance studies, we can see significant performance improvements:

I/O Driver @ 4K Block Size – ~ I/Os Per Second

Channel Type

#I/Os per Sec

n:1 Increase

ESCON

1200

N/A

FICON Express
2/4 Native

14000

11.7

FICON Express
2/4 zHPF

31000

2.3

FICON Express
8 Native

20000

1.5

FICON Express
8 zHPF

52000

2.6

FICON Express
8S Native

23000

1.2

FICON Express
8S zHPF

92000

1.8

NB. Maximum performance is server related (E.g. z10, z114, z196, zEC2).

Compared to ESCON, the latest 8 Gbps FICON channel leveraging from zHPF function delivers ~76 times more I/O throughput compared to ESCON, while significantly increasing throughput, by at least 50% from generation to generation.

I/O Driver Mixed Read/Write – ~ MBs Per Second

Channel Type

#MBs per Sec

n:1 Increase

ESCON

12

N/A

FICON Express
4 Native

350

29.2

FICON Express
4 zHPF

620

1.8

FICON Express
8 Native

620

1.8

FICON Express
8 zHPF

770

1.3

FICON Express
8S Native

620

1.0

FICON Express
8S zHPF

1600

2.1

NB. Maximum performance is server related (E.g. z10, z114, z196, zEC2).

Compared to ESCON, the latest 8 Gbps FICON channel leveraging from zHPF function delivers ~133 times more I/O throughput compared to ESCON, while significantly increasing throughput, by at least 100% from generation to generation.

Once again, the backwards compatibility capability of the IBM Mainframe server is highlighted by the evolution of the FICON channel, and in particular, Disk Subsystems IHV’s, obviously IBM themselves, but notably EMC, HDS and Oracle (StorageTek) in evolving their offering to support the latest FICON technologies.

We sometimes might take for granted how much data can be stored by a single footprint IBM Mainframe and how much performance and throughput capability is available to process this data.  However, we shouldn’t under estimate what role FICON has played in allowing this significant data (I/O) processing capability to grow, often rapidly, sometimes exponentially.

If there is a downside, such performance attributes might have eradicated the skills required to tune I/O subsystems, but that’s perhaps a subject matter for another day…