z13: A Digital Business Ready Solution?

As per the usual next generation zSeries Server release, IBM announced their latest evolution on 13 January 2015, namely the z13. IBM describe this platform as the most powerful and secure system ever built:

  • First system able to process 2.5 billion transactions per day, built for mobile economy
  • Makes possible real-time encryption on all mobile transactions at scale
  • First mainframe system with embedded analytics providing real time transaction insights 17X faster than compared competitive systems at a fraction of the cost

At first glance, feeds and speeds generally don’t enthuse the audience, but if we dig deeper and acknowledge other recent IBM developments incorporating Apple, Twitter and Data Analytics announcements, we perhaps can draw some better business-facing conclusions. IBM have a clearly defined Cloud, Analytics, Mobile, Social & Security (CAMSS) initiative, seemingly based upon the IDC 3rd platform defined as Social, Mobile, Analytics & Cloud (SMAC).

Industry analysts predict that in the next 3 years and by 2017, SMAC (CAMSS) expenditure will account for 25%+ of total enterprise software market revenue, doubling from ~12% in 2012. In simple terms, this new expenditure opportunity represents $100+ Billion revenue. We can imagine that all major ISV’s will be wanting their share of this market…

Whichever classification you choose, IBM CAMSS or IDC SMAC, IT infrastructures and associated investment currently are and certainly will be heavily influenced by this new world computing paradigm. Like it or not, an ability to perform a transaction anywhere (Mobile), keeping everything simple and networked (Social Media), real time prediction of future customer requirements (Analytics), available anywhere (Mobile), for an alleged fraction of the cost (Cloud), makes sense for the 21st Century business. Ignore this new technology evolution at your peril as it will impact each and every area of the IT enterprise and associated resources, primarily software and supporting hardware.

Did you notice the difference between the IBM classification and IDC? IDC have not considered Security to be a consideration factor worthy of acronym (SMAC) inclusion. In today’s world of cybersecurity, that might be somewhat of an oversight, but we must assume that IDC consider cybersecurity to be a consideration for all of the Analytics, Cloud, Mobile & Social aspects, which of course it is!

If we consider the relative merits of technology platforms from a security viewpoint, the IBM z13 delivers EAL5+ security certification, whereas other non-Mainframe platforms can only currently claim EAL4+ certification.

It is estimated that 55%+ of enterprise (mission critical) transactions are processed by the IBM Mainframe, but this is based on pre mobile workloads. It therefore makes commercial sense for IBM to safeguard their flagship platform not only maintains the existing IBM Mainframe customer base, but captures new and mobile centric workloads.

Having considered the business requirements for today’s IT business, let’s now classify the new features of the z13 platform:

  • Up to 40% more total system capacity compared to the zEC12.
  • Up to 10 terabytes (TB) of available Redundant Array of Independent Memory (RAIM) real memory per server.
  • Cryptographic performance improvements with new Crypto Express5S.
  • Economies of scale with simultaneous multithreading delivering more throughput for Linux and zIIP-eligible workloads.
  • Improved performance of complex mathematical models, perfect for analytics processing, with Single Instruction Multiple Data (SIMD).
  • IBM zAware cutting-edge pattern recognition analytics for fast insight into system health extended to Linux on z Systems.
  • A reduction in elapsed time for I/O-bound batch jobs with new FICON Express16S versus FICON Express8S.
  • Support for larger memory configurations planned to be supported on z/OS systems, which can be used to improve transaction response times, lower CPU costs, simplify capacity planning and ease deploying memory-intensive workloads. (The IBM z13 offers up to 10 TB memory.)
  • I/O service time improvement when writing data remotely using the new zHPF Extended Distance II.
  • Support for up to 256 coupling CHPIDs, which provides enhanced connectivity and scalability for a growing number of coupling channel types.
  • IBM Integrated Coupling Adapter (ICA SR), which offers greater short reach coupling connectivity than existing link technologies and enables greater overall coupling connectivity per IBM z13 than prior server generations.
  • Capability to extend z/OS workload management policies into the SAN fabric.
  • New rack-mounted Hardware Management Console (HMC), helping to save space in the data center.
  • Non-raised floor option, offering flexible possibilities for the data center.
  • Optional water cooling, providing the ability to cool systems with user-chilled water.
  • Optional high-voltage dc power, which can help IBM z Systems clients save on their power bills.
  • Optional top exit power and I/O cabling designed to provide increased flexibility.
  • New IBM z BladeCenter Extension (zBX) Model 004 in support of heterogeneous resources managed by IBM z Unified Resource Manager.

As we all know, Moore’s Law had to end sometime soon and this is true for System z CPU chips. The zEC12 CPU was often claimed to be the fastest commercial processor, with a 32nm core and a 5.5 GHz rating. The z13 chip runs a 22 nm core at a 5 GHz, at first glance ~10% slower than the zEC12. The new z13 chip delivers a ~10% performance increase, due to advances in core design, with better branch prediction and pipelining in the core. Noteworthy, is the slightly slower clock speed of the z13 chip, reducing heat output, probably signifying that ~5 GHz is the ceiling for CPU chips in the near future.

However, for z13, the doubling of performance still apples for many other resources:

  • Cryptographic coprocessors performance (~2*)
  • Channel speed (~2*)
  • I/O bandwidth (~2*)
  • Memory/Cache performance (~2*)
  • Memory capacity (~3*)

Once again, classifying these technological advances in terms of mobile business, the z13 delivers real-time encryption of mobile transactions, protecting transaction data, delivering consistent response times for a quality customer experience. Overall, IBM claims the z13 delivers a potential for ~36% better response time, ~61% better throughput and ~17% lower cost per mobile transaction.

A major and subtle change introduced with the z13 is Simultaneous MultiThreading (SMT). SMT allows 2 active instruction streams per core, each dynamically sharing the core’s execution resources. SMT will be available in IBM z13 for workloads running on the Integrated Facility for Linux (IFL) and the IBM z Integrated Information Processor (zIIP).

Each software Operating System/Hypervisor has the ability to intelligently drive SMT in a way that is best for its unique requirements. z/OS SMT management consistently drives the cores to high thread density, in an effort to reduce SMT variability and deliver repeatable performance across varying CPU utilization, thus providing more predictable SMT capacity. z/VM SMT management optimizes throughput by spreading a workload over the available cores until it demands the additional SMT capacity.

From a capacity planning and performance measurement viewpoint, just a slight note of caution. Although the z13 CPU chip delivers increased CPU capacity, the raw speed is slower and there are considerations for SMT. A former IBM staffer, Bob Rogers has written a great article on this SMT subject matter, which should be on your reading list!

In conclusion, the z13 announcement is another step forward for zSeries Mainframes. If you consider this announcement as just another next generation zSeries Mainframe announcement, you’re not treating your business or yourself with the respect they deserve. Instead, please consider this z13 announcement as an evolution from an enterprise solution delivery viewpoint. Primarily, consider the 21st century business keywords, in no particular order, of Analytics, Cloud, Mobile, Social & Security.

IFL – A Cost Efficient zSeries Platform?

In September 2000, IBM introduced the Integrated Facility for Linux (IFL) processor, a specialty engine for and some might say dedicated to running the Linux Operating System.  At the time of this announcement, companion software named S/390 Virtual Image Facility for Linux was introduced to assist in the rapid deployment of IFL configurations, especially for non-Mainframe personnel.  However, this product was quickly discontinued, in favour of the standard z/VM Operating System, which is not difficult to learn and can accommodate hundreds if not thousands of zLinux images.

Today, the IFL is still a processor dedicated to Linux workloads on IBM System z servers.  The IFL is supported by z/VM virtualization and the Linux operating system.  The IFL cannot run other IBM operating systems.  The competitively priced IFL processor is a CPU capacity enabler, exclusively for Linux workloads.  Linux deployment (I.E. SUSE & Red Hat) on IFL’s can reduce expenses in the areas of operational efforts, energy, floor space and especially software.

The IFL provides the following functions and benefits:

  • The IBM Enterprise Linux Server is a dedicated System z Linux server, comprised of only IFL processors
  • No additional IBM software charges for traditional (E.g. z/OS, CICS, DB2, WebSphere, et al) environment
  • Performance improvement for Linux workloads with each successive generation of IFL and System z technology
  • Linux workload on the IFL does not result in increased IBM software charges for traditional System z operating systems and middleware
  • Same functionality as a General Purpose processor on a System z server
  • HiperSockets can be used for communication between Linux images, or Linux and other operating system images on the same System z system
  • z/VM virtualization and most IBM Linux middleware products, plus most vendor software products are priced per processor (core) according to the System z IBM International Program License Agreement (IPLA).  IPLA products have a one-time-charge (OTC) and an annual (optional) maintenance charge, called Subscription & Support
  • Supported by the current z/VM virtualization and IBM Wave for z/VM software versions
  • Always a full capacity processor, independent of the capacity of the other processors in the server
  • Orderable as a System z hardware feature. The number of orderable IFL features varies by the server model and configuration
  • Designed to operate asynchronously with other General Purpose processors
  • Managed by PR/SM in logical partition with dedicated or shared processors. The implementation of an IFL requires a Logical Partition (LPAR) definition, where following normal LPAR activation procedure, LPAR defined with an IFL cannot be shared with a general purpose processor.

There will always be the debate as to which processor and associated server type (E.g. x86, POWER, SPARC) is the most cost efficient, but there is no doubt that the ability to accommodate hundreds if not thousands of zLinux instances in one zServer environmental (E.g. Power, Cooling, Floor Space, et al) friendly footprint, with software pricing per core is worthy of consideration.

Adoption for zLinux has been steady and especially in the emerging territories where it’s not unusual for zSeries deployments to be totally zLinux (I.E. IBM Enterprise Linux Server) based.  Moreover, the majority of large and traditional IBM Mainframe users (I.E. z/OS) have installed at least one IFL, if only to evaluate the z/VM and zLinux offering.  Many have deployed the IFL and associated zLinux solution for business requirements.

Therefore, if one of the major cost benefit features of IFL is optimized software costs; can the IFL processor be considered for other workloads, originating from the traditional zSeries (I.E. z/OS) environments?

Proximal Systems Corporation (PSC) is a company with a solution that transparently offloads data processing from IBM Mainframes to Distributed Systems, with an objective of reducing software cost, while maintaining or improving performance.  The company name is derived from the concept of bringing disparate computing systems into close proximity, functionally speaking, providing totally seamless and transparent interoperability.  The result is a unified computing complex within which various tasks can be easily migrated between systems to their most cost efficient operating environment, while still being able to interoperate as if they were all hosted together on the same system.

The PSC Proxy Coupling Technology allows for a CPU orientated task to be offloaded from one system to another by means of an associated proxy task, which has an identical interface as the task to be offloaded, but delegates the majority of the processing to an offloaded task on another system.  The primary objective of this function are for the cost savings and/or performance improvements that might be delivered by migrating tasks to systems that are able to execute those tasks more efficiently.

The fact that the proxy task maintains the same interface as the application being replaced is crucial; as many past Mainframe migration projects have failed due to insurmountable interoperability problems between the Mainframe and Distributed Systems servers (I.E. Windows, Linux, UNIX, et al).  Proxy Coupling Technology offers a solution to this long-standing challenge.  In theory, this allows for the transparent offload of a traditional z/OS workload (E.g. Sort) from General Purpose (GP) processors, to a less expensive (E.g. IFL) alternative…

In the first instance, the Proxy Coupling Technology offloads General Purpose CPU workload associated with the z/OS sort (I.E. CA Sort, DFSORT, Syncsort) function, to another platform (E.g. IFL).  For IFL based implementations, HyperScokets are utilized to transfer data at memory speeds from the z/OS task to zLinux on the IFL, where the sort operation completes, while the resulting z/OS task and associated data are maintained, as per normal.  From an IFL viewpoint, Ahlsort software performs the sort operation, being a sort solution that maintains compatibility with the majority of z/OS sort function (I.E. Control Card Syntax).  Therefore, this is a transparent implementation, where the only consideration is how much CPU capacity is required for the offload function (E.g. IFL, x86).  The benefits are reduced z/OS MSU usage for the sort function, which can be quite significant, as most business data (E.g. Database Offloads, Customer Orientated, et al) is sorted on a daily if not more frequent basis.

Just as IBM introduced the zAAP on zIIP capability, which allowed some customers to more easily justify a specialty engine (I.E. zIIP), combining workloads to exploit the full capability of the specialty engine; in theory the same ethos applies with the Proxy Coupling Technology.  For the avoidance of doubt, workloads that can be processed on an IFL, such as z/OS sort tasks, can assist in delivering higher Return On Investment (ROI) levels for the IFL, for example:

  • Reduced z/OS WLC MSU usage (I.E. Sort function offload) and associated software costs savings
  • IFL processors run at Full Speed and do not add to traditional workload (I.E. z/OS) software costs
  • Utilize any spare IFL CPU resource not used, releasing General Purpose CPU resource for other work

In conclusion, the Proxy Coupling Technology offers a proposition that is similar to the IBM philosophy of reducing z/OS software costs via specialty engines.  Seemingly to date, primarily only the zIIP and zAAP specialty engines were available to optimize CPU usage for z/OS workloads.  Offloading CPU cycles and thus MSU workload to IFL makes sense, utilizing a cost efficient and indeed a full power CPU engine, where for cost reasons, maybe the majority of z/OS customers don’t deploy the “highest” derivative of General Purpose CPU engine available to them.  On the face of it, the realm of possibility exists for other workloads to benefit from z/OS to IFL CPU offload, following sort, which seems to make sense as the first workload to utilize this solution.

The IBM Mainframe: Just Another Node On The IP Network!

With the introduction of MVS/ESA Version 4.3 in 1993, the IBM Mainframe included the major foundations for meaningful Distributed Systems connectivity, including the first steps of POSIX compliance via OpenEdition functionality.  However, even before this timeframe, the TCP/IP protocol was available in the first release of MVS/ESA Version 4 (4.1), although in a very limited fashion.  In this instance, MVS was benefitting from the path already trodden by the VM Operating System and the TCP for VM software product.  Put another way, even when TCP/IP was in its early stages, being deployed and evolved in universities and scientific laboratories (E.g. CERN), its foundation was being embedded into the IBM Mainframe.

Early IBM Mainframe TCP/IP usage allowed for RS/6000 (AIX) connectivity, LAN integration via Novell NetWare, typically via the 3172 Interconnect Controller, Sockets Interface (E.g. CICS), et al.  In 1994, IBM introduced the Open Systems Adapter (OSA) processor feature for S/390 Parallel Enterprise Servers.  The OSA provided native Open Systems connectivity to the Local Area Network (LAN), directly via the Mainframe processor.  The OSA feature supported the Fiber Distributed Data Interface (FDDI), Token-Ring & Ethernet LANs, arguably making the 3172 controller obsolete.

So, since the early-mid 1990’s, even before pervasive usage of the Internet, the Mainframe was already a fully functioning and efficient user of IP networking.

How is the TCP/IP function being utilized by the IBM Mainframe today?

TCP/IP on z/OS supports all of the well-known server and client applications.  The TCP/IP started task is the engine that drives all IP-based activity on z/OS.  Even though z/OS is an EBCDIC host, communication with ASCII-based IP applications is seamless.

IP applications running on z/OS use a resolver configuration file for environmental values.  Locating a resolver configuration file is somewhat complicated by the dual operating system nature of z/OS (UNIX and MVS).  Nearly each and every z/OS customer deploys the following core TCP/IP services:

TCP/IP Daemon: The single entity that handles, and is required for, all IP-based communications in a z/OS environment is the TCP/IP daemon itself.  The TCP/IP daemon implements the IP protocol stack and runs a huge number of IP applications to the same specifications as any other operating system.

TCP/IP Profile: Is loaded by TCP/IP when started.  If a change needs to be made to the TCP/IP configuration after it has been started, TCP/IP can be made to reload the profile dynamically (or read a new profile altogether).

FTP Server: Like some other IP applications, FTP is actually a z/OS UNIX System Services (USS) application.  It can be started within an MVS environment, but it does not remain active in z/OS.  It immediately forks itself into the z/OS UNIX environment and tells the parent task to kill itself.

Telnet Daemon: There are two telnet servers available in the z/OS operating environment.  One is the TN3270 server, which supports line mode telnet, but it is seldom used for just that.  Instead, it is primarily used to support the TN3270 Enhanced protocol. The other telnet server is a line mode server only, referred to as the z/OS UNIX Telnet server (otelnetd).

Many IBM and ISV software products exploit IP and USS functionality, most typically WebSphere (MQ).

Whether UNIX System Services (USS) or TCP/IP usage, the convergence of the IBM Mainframe and UNIX technologies arguably became mandatory with the deployment of TCP/IP on the IBM Mainframe.  Obviously the technical personnel that support these different platforms have their own viewpoint as to which platform might be the best, but that is somewhat of an arbitrary point.  However, what is absolutely certain is recognition of how data is stored and secured in a UNIX environment and indeed the z/OS (MVS) specific environment, originally named MVS OpenEdition, but now commonly referred to as OMVS.

There are fundamental differences too numerous to mention when comparing the User and File management policies and processes, when comparing the security and data access lifecycle intricacies of z/OS and UNIX.  So what you might say!  This might be a cursory and lax attitude, as business critical data is probably being stored in OMVS file systems, if only for FTP purposes, but more than likely for other more pervasive and user based access (E.g. Database, Messaging, Data Mining, Data Exchange, et al).

So, which technical party is managing the security of Unix System Services (USS) file systems for the OMVS Mainframe deployment?  Is it the Mainframe Systems Programmer, the Unix System Admin or the Mainframe Security Team, or somebody else?  To date, some people might have thought it didn’t matter, but of course, seasoned security professionals knew that this was never the case.  However, the migration to z/OS 2.1 is a tangible juncture for each and every IBM Mainframe installation to review their USS and thus OMVS security deployment.  Why?

The BPX.DEFAULT.USER facility was introduced with OS/390 2.4 and was a commonly used process for implementing USS (OMVS) security.  However, with z/OS 2.1, the BPX.DEFAULT.USER facility is withdrawn, meaning that the Mainframe user must perform some migration actions.  IBM provide some generic assistance with this challenge via APAR OA42554 and APAR OA37164.  However, maybe this is an ideal juncture to perform a thorough review of USS (OMVS) security, vis-à-vis a comprehensive and dispassionate audit, highlighting issues, implementing standards and securing exposures.  For example, use of UID(0) must be eradicated and certainly no human being should be allocated such privileges.

There are some useful guidelines available from security specialists such as Vanguard, where the process can be simplified using their Identity & Access Management (IAM) toolset.  Similarly, recent user conferences have included presentations on this subject matter.

In conclusion, the IBM Mainframe can be classified as just another node on the IP (TCP/IP) network.  However, as always, no matter how secure the Mainframe platform might be, the biggest threat is typically the human being, and for USS, the migration to z/OS 2.1 forces us to review OMVS security settings.  Therefore, let’s do a good job and eradicate any security exposures we might have inadvertently implemented over the years.  As we all know, passing an external security audit process doesn’t necessarily mean our IT systems and processes are secure, while sometimes the internal security people are better qualified or more knowledgeable than external auditors.  Arguably most external auditors will do a good job of auditing UNIX platforms, yet their Mainframe knowledge and abilities are typically limited.  It is therefore somewhat of a paradox that in this particular area of z/OS USS, the typical UNIX exposures are not highlighted in the typical Mainframe security audit process…

One must draw one’s own conclusions as to the merits of engaging 3rd Mainframe security specialists to perform such an audit, coinciding with this z/OS 2.1 migration activity, safeguarding that OMVS security and processes are as good and secure as they can be.  Put another way, why wouldn’t a Mainframe organization go that extra mile to safeguard their most valuable of assets, namely business critical data, engaging a 3rd party specialist to review and provide guidance on this subject matter.

Are You Ready For z/OS Mobile Workload Pricing (MWP)?

Recently IBM announced Mobile Workload Pricing (MWP) for z/OS which can minimize the impact of mobile workloads on Sub-Capacity license charges, delivering optimized pricing for System z environments extending their workloads to incorporate mobile devices.

MWP only applies to Mainframe customers deploying a zEC12 or zBC12 in their enterprise, as per the AWLC or AEWLC (AKA Advanced/Entry Workload License Charges) metric; MWP is also extended if a zEC12 or zBC12 enterprise is deploying a z196 or z114 via the AWLC or AEWLC metric.

The primary consideration for MWP is determining how a Mainframe customer can comply with the tracking requirements for mobile workloads.  On the plus side, MWP does not require an isolation of mobile workload transactions in separate LPARs, using enhanced reporting for software pricing.  This is a major step forward when compared with Integrated Workload Pricing (IWP), which ideally requires large LPAR container structures, minimizing costs for WebSphere workloads, applying to the CICS, IMS and WebSphere MLC software products.  Conversely, MWP includes DB2 in the list of eligible software products for cost reduction.

If a Mainframe customer is eligible for MWP pricing they will then need to utilize the Mobile Workload Reporting Tool (MWRT), which is analogous to the original Sub-Capacity Reporting Tool (SCRT).  This is an either/or situation, the Mainframe customer only submits MCRT reports to IBM if they’re MWP eligible, or the status quo remains, where non-MWP Mainframe customers continue to submit SCRT reports.

The Mainframe customer must track and report General Purpose (GP) CPU time for mobile transactions, reporting those values in a pre-defined format to IBM each month to benefit from MWP.  MWRT utilizes reported mobile transaction data to adjust the Rolling 4 Hour Average (R4HA) Sub-Capacity software eligible MSUs, with LPAR granularity.  Optimizing mobile transactions impact for peak LPAR MSU values delivers benefit when higher mobile transaction volumes generate MSU resource usage peaks (Workload Spikes).

MWRT calculates the R4HA for mobile transaction GP MSU resource usage, subtracting 60% of those values from the traditional Sub-Capacity software eligible MSU metric, with LPAR granularity, for each and every reporting hour.  The software program values for the same hour are aggregated for all Sub-Capacity eligible LPARs, deriving an adjusted Sub-Capacity value for each reporting hour.  Therefore MWRT determines the billable MSU peak for a given MLC software program on a CPC using the adjusted MSU values.

Most committed zSeries Mainframe customers will be deploying CICS, DB2 and WebSphere software, while IT trends dictate that mobile device usage (I.E. Smartphone, Tablet, et al) is increasing.  Therefore most z/OS applications that require such mobile access have evolved accordingly over time.  Therefore it seems to be one of those “No Brainer” type scenarios, where the Mainframe user should plan to benefit from MWP, either as they upgrade to the latest zSeries technology, namely zEC12 or zBC12, or immediately if already deploying a zEC12 or zBC12 server.

The only minor consideration is a requirement for the zEC12 or zBC12 customer to engage their local IBM account team, to determine what data they need to report on mobile transactions for MWP consideration.  This one off task will deliver optimized WLC pricing forever more.

Of course IBM are encouraging customers to consider the Mainframe for new applications, driven by mobile transaction requirements.  Equally, there is no reason why longer term Mainframe customers can’t benefit from MWP, benefitting from reduced MLC costs, a major consideration of Mainframe TCO.

Cloudy With A Chance Of Mainframe?

With the advent of Computer Generated Imagery (CGI) there is seemingly no end to the number of books, especially “children’s” books that can be encapsulated and delivered in animated movie format.  I’m always surprised and arguably never surprised by the messaging in these stories; supposedly written for the younger person, but invariably delivering a message of good morals, ethics and human qualities, typically finding creative solutions to a myriad of problems.  Of course, we’re all human, and typically as human beings, we’re responsible for the majority of our problems, either knowingly, or not.

Cloudy with a Chance of Meatballs is a book based on a town named Chewandswallow characterized by its strange daily meteorological pattern, providing townsfolk with all of their required daily meals by raining food.  Although the residents of the town enjoy a lifestyle devoid of any grocery shopping or cookery, the weather unexpectedly and inexplicably takes a turn for the worse, devastating the local community with destructive and uncontrollable storms of either unpleasant or dangerously oversized foods, resulting in unstoppable catastrophes for the townspeople.  Their lives endangered by the threats of the storms, they relocate to a different community of average meteorological patterns, safe from the hazards that once were presented by raining meals.  However, they are forced to learn how to obtain food the normal way.

So what?  Continuing with the creativity thought, the ethos of this story might be somewhat analogous to the sometimes polarized opinion between Distributed Systems and Mainframe computing.  So depending on your philosophical bent or which side-of-the-fence you sit, there is only one choice, even if this seemingly perfect and de facto world is generating significant challenges… 

Recently, z/OS 2.1 became Generally Available (GA) and most notably from my viewpoint was its continued and demonstrable ability to participate in cloud computing environments.  So is the IBM Mainframe ready for the cloud?  Wasn’t it always!

The fundamental ethos of the Mainframe environment is virtualization and was forever thus.  The Mainframe has always shared the basic IT architecture components, including CPU, Memory, Storage, Networking and other peripherals, originally in a physical single-image structure, but since the late 1990’s in a shared (SYSPLEX) complex of interconnected physical servers (CPCs).  So the Mainframe is and always has been ready for “Prime Time Cloud”!

z/OS V2.1 is a platform designed to dynamically respond and scale to workload change with enhancements to scalability and performance that cover operations, I/O, virtual storage constraint relief, memory management, and more.  These enhancements are suitable for organizations that would like to catalyse a journey to highly scalable virtualized solutions like cloud.

IBM delivers improved scalability and performance for outstanding throughput and service within existing Mainframe environments.  Smarter scalability can better prepare the user for growth and spikes in workloads while maintaining the qualities of service and balanced design that customers have come to expect of the IBM mainframe.

As customers consider all the components of downtime, the true costs can be surprising, which is why superior availability continues to remain a key factor in platform selection. With z/OS V2.1, IBM introduces new capabilities designed to improve upon the already legendary z/OS system availability.  The industry-leading resiliency and high availability of System z remain key reasons why organizations keep their most critical processing on System z.  With its attention to outage reduction, the availability of System z and z/OS is well recognized in the industry.  In z/OS V2.1, IBM continues enhancements that improve critical IT systems availability, helping achieve an even higher level of service for customers.

Some of the “cloud friendly” z/OS 2.1 benefits include:

  • Support for Shared Memory Communications-RDMA (SMC-R), for low latency, application transparent communications to help you move data quickly between z/OS images on the same CPC or between CPCs.
  • Flash Express support for certain coupling facility list structures, such as IBM WebSphere MQ for z/OS, V7 (5655-R36), in order to strengthen resiliency for enterprise messaging workload spikes.
  • For zEC12 or zBC12 systems, shared engine coupling facilities can be used in many production environments, for improved economics by offering a high level of performance without requiring the use of dedicated CF engines.
  • EXCP support for System z High-Performance FICON (zHPF) is designed to help improve I/O start rates and improve bandwidth for more workloads on existing hardware and fabric.
  • Usability and performance improvements for z/OS FICON Discovery and Auto Configuration (zDAC), including discovery of directly attached devices.
  • Serial Coupling Facility structure rebuild processing, designed to help improve performance and availability by rebuilding coupling facility structures more quickly and in priority order.
  • 100-way symmetric multiprocessing (SMP) support in a single LPAR on IBM zEC12 or zBC12 systems.  Support for an architectural limit of 4 TB of real memory per LPAR.
  • Support for 2 GB pages is provided on zEC12 and zBC12 systems.  This feature is designed to reduce memory management overhead and improve overall system performance by enabling middleware to use 2 GB pages.  These improvements are expected due to improved effective translation lookaside buffer (TLB) coverage and a reduction in the number of steps the system must perform to translate a 2 GB page virtual address.
  • Capacity Provisioning is designed to provide support for manual and policy-based management of Defined Capacity and Group Capacity.  This function broadens the range of automatic, policy-based responses available to help manage capacity shortage conditions when WLM cannot meet your workload policy goals.

There are numerous new and enhanced functions delivered with z/OS 2.1, too numerous to mention, but categorised as Quality Of Service, Availability, Networking, Security, Data Usability, Integrity, Systems Management, Application Development, Simplification & Usability, International Standards Compliance, et al.

So let’s not forget, this foundation and support for an IT infrastructure and its supporting eco (software) system is in one scalable, secure and “zero” downtime environment!

So maybe for us open-minded and enlightened generation of parents (oops, I forgot, Grandparents for us Dinosaur Mainframe folk!) that can now “access” children’s stories, even if it’s in the form of a CGI animated movie, maybe we can be dispassionate enough to consider all platforms, Distributed and Mainframe for our evolving business and associated IT requirements. 

So you decide, can it be Cloudy With A Chance Of Mainframe?  To overlook such an option, might be an oversight, just as overlooking the abundance of human stories, classified as children’s books or not…

Mainframe ISV Software: Is Continuous Product Improvement Always Evident?

Ken Venturi once said “I don’t believe you have to be better than everybody else.  I believe you have to be better than you ever thought you could be”.

Wouldn’t it be great if every CTO and/or Product manager had this same philosophy for their Mainframe software solution?  One such example I have experienced over the years is (E)JES from Phoenix Software International (PSI).  Of course it’s really important to have Day 1 support for the latest release of Operating System, z/OS 2.1 being the latest example, but what about actually exploiting the latest functionality available with the latest zSeries Mainframe Enterprise Servers and z/OS Operating Systems?

To drive maximum bang from you’re your buck, optimal performance and robust cost optimization can only be possible by recognizing and exploiting the latest Mainframe function ASAP, as and when appropriate.  Furthermore, listening to your customers, analysing their feedback, actively participating in User Organizations such as SHARE, and so on, will all help in continuous product development and innovation.

Here are some of the reasons why (E)JES has succeeded over a 30+ year period, recognizing and exploiting new z/OS function, as and when the updated z/OS is released for General Availability (GA).  Even today, with Version 5.3 supporting z/OS 2.1 as of day 1, (E)JES continues to offer value-added function for the seasoned, inexperienced and in fact, all IBM Mainframe technicians:

  • 64-bit performance optimizations (I.E. MEMLIMIT: above-the-bar) for both (E)JES client and server components, safeguarding minimal z/OS resource usage.
  • Nearly all (E)JES JES subsystem processing routines are eligible for zIIP redirection, delivering software cost savings for all (E)JES users.  Sub-Capacity System z processor users experience improved (E)JES performance because zIIP engines always run at full speed.  This behaviour differs from that of General Purpose CPs, “throttled” with Sub-Capacity deployments.
  • (E)JES code executes faster via its inbuilt High Performance Routine (HPR) facility, specifically developed to make (E)JES code execute faster while accessing data in JES control blocks.  HPRs have a shorter instruction path length than previous coding techniques, avoiding delays in modern z Series CPU instruction pipelines.
  • If High Performance FICON (zHPF) is available, (E)JES uses Transport Mode channel programs for JES Spool I/O.  When zHPF is not available, or when a CAS server performs I/O against the global data set, (E)JES uses the highest-performing Command Mode channel programs currently available.  These channel programs perform I/O significantly faster than “ordinary” channel programs.
  • The use of 24-bit (captured) UCBs puts a strain on the 24-bit virtual storage resource.  The use of ordinary (non-extended) TIOT entries puts a limit on the total number of allocations that can exist simultaneously in an address space.  (E)JES supports and uses 31-bit (uncaptured) UCBs and the extended TIOT (XTIOT) function (I.E. NON_VSAM_XTIOT=YES in DEVSUPxx PARMLIB)
  • (E)JES supports placement of JES spool data sets in the cylinder-managed area of an Extended Address Volume (EAV).  Of course, as of z/OS 1.12, EAV increases 3390 DASD capacity to ~1 TB.
  • (E)JES Pattern Utility Matching uses the SRST hardware instruction.  Empirical measurements show this technique is far faster on modern System z processors than alternatives such as the TRT instruction or “brute force” matching techniques using CLI/CLC.

One of the primary benefits of upgrading IBM z/OS software is the overall system performance benefit and associated cost reduction, but of course, IBM can only deliver the function and ability, while it’s incumbent upon the ISV community to upgrade their software products accordingly.  A key goal for any good ISV software product is to try to provide a value-add in the area of performance.  This has been one of the primary areas of focus for (E)JES since its introduction in 1978. 

Most spool display and management products tend to rely on the most resource-intensive interface available, namely the JES subsystem provided SSI 80.  (E)JES benchmarking tests against the most readily-available JES SSI 80 exploiters demonstrates significant CPU savings when deploying (E)JES.

Software products also need to deliver continuous improvements with regard to usability, presentation and in-built function, increasing user and system administrator productivity.  Without doubt, optimization encompasses not just hardware, but software, services, systems management disciplines and “best practices” that tie it all together.  Here are some of the usability enhancements that (E)JES has incorporated:

  • ISPF users running a 3270 emulator on a programmable workstation can now search IBM Eclipse-based InfoCenters via (E)JES.  Although (E)JES fully supports BookManager format documentation, BookManager READ/MVS is now obsolete, beginning with z/OS 2.1, BookManager softcopy books are no longer delivered by IBM.  IBM has stated that InfoCenters, and eventually KnowledgeCenters, are their strategic direction for online documentation.
  • (E)JES Web is a new, browser-based interface to (E)JES.  The associated RESTful API delivering this web enabled technology provides a framework for the creation of Eclipse plug-ins, mobile applications, and other web services clients.  This facility will provide a “rapid learning” type facility for users (E)JES users, both new and old that might be uncomfortable navigating traditional 3270 interfaces.
  • (E)JES provides a Java Application Programming Interface (API), complementing other in-built APIs for REXX and procedural languages.  By using an (E)JES API, a user can harness the versatility of their preferred programming language to interface and interact with (E)JES.  This support provides an interface to deliver nearly all of the capabilities available to an interactive (E)JES user.
  • (E)JES incorporates context sensitive help function, with point-and-shoot/pop-up dialogs, helping educate users on (E)JES, JES and z/OS while they work.  Users can get pop-up explanations of columns, input choices for unprotected fields, and a list of line commands.  Smart pop-ups explain the contents of certain columns, such as system abend codes.

The latest (E)JES Release Information Manual eloquently details the product enhancements over the last 5 releases or so, providing a good Product Roadmap reference point.

So, whether the ISV software product you deploy has been available for several years or several decades, do you safeguard maximum business benefit for optimal cost by considering:

  • Does the ISV deploy the latest zSeries server (I.E. zBC12, zEC12) for software interoperability and full hardware function exploitation; or an emulation (I.E. zPDT) technique?
  • Does the ISV deliver value-added z/OS related function on Day 1 or even within a year of the latest z/OS release?
  • Does the ISV deliver meaningful function to assist your users deploy said function, while simplifying environment management for system administrators?
  • Does your ISV product optimize cost, with Sub-Capacity pricing in MSU increments, aggregated MSU costs for your entire zSeries Mainframe environment, as opposed to specific workloads (E.g. CPC’s, LPAR’s, et al)?
  • Does your ISV product optimize cost by offloading the majority of its CPU function to zIIP specialty engines, which run at maximum speed, and where software “runs for free”?

Of course, only you can ask and potentially answer these questions during your day-to-day activities of maintaining currency and optimal performance for your Mainframe software portfolio.

Sometimes the hardest questions anybody can ask are the questions they ask themselves, which are never rhetorical questions!  Extracted verbatim from the latest (E)JES Release Information Manual:

Team (E)JES took advantage of the Phoenix Software International zHISR performance analysis product to discover performance “hot spots” in  the (E)JES product.  Sometimes the simplest, least conspicuous piece of code turns out to be a major CPU contributor.  See below for some of the most embarrassing “surprise” hot spots we discovered using zHISR in a z/OS 2.1 LPAR:

  • Over 30% of the CPU used during a Spool Data Browse FIND operation, against a multi-million-line SYSOUT in JES2, turned out to be code that was clearing a record buffer to blanks using MVCL.  This clearing code was eliminated and some minor adjustments were made in other code to compensate for this change.
  • 27% of the CPU used to produce the Activity display in JES2 turned out to be in a routine that manages an internal resource called the “Job Positions Table.”  The algorithm was improved (to work more like its JES3 counterpart) and that routine is no longer a significant CPU contributor.
  • 9% of (E)JES session start-up was a 26-year-old “brute force” prime number generator used to compute the size of a hash table.  That code was totally reworked and now accounts for approximately .02% of session start-up CPU.
  • A 6% performance penalty was observed when sorting a tabular display with a moderate number of rows. The hot spot turned out to be the code that cleared the work area for the sort service to zeros (another MVCL). This overhead was reduced to .04%.

Mea culpa and humility, never a bad thing, but you have to be honest with yourself and ask yourself the right questions!  So going back full circle and quoting Ken Venturi once again, “I don’t believe you have to be better than everybody else.  I believe you have to be better than you ever thought you could be”.  You must draw your own conclusions as to whether such an observation applies to the (E)JES team at Phoenix Software International (PSI)…

Why not ask them yourself?  Ed Jaffe, the (E)JES CTO will be available at the forthcoming UK GSE Annual Conference, 5-6 November 2013, speaking about (E)JES System Management Software: More With Less For Less, For The z/OS Mainframe and z/OS 2.1 User Experiences.

21st Century Mainframe Capacity Planning Requirements

With nearly 5 decades of longevity the IBM Mainframe has changed beyond recognition in terms of CPU capacity and performance capability.  The Capacity Planning discipline for the IBM Mainframe server became more advanced and proactive in the early 1990’s, perhaps coinciding with the introduction of Parallel Sysplex structures associated with the MVS/ESA operating system.  Therefore the requirement to measure and model the impact of workload movement between LPAR and CPC structures became important, if not mandatory.

The fundamental building-block for Mainframe CPU usage analysis is SMF Type 7n records (I.E. RMF or CMF), where this data was typically processed by MXG, MICS and maybe CIMS (acquired by IBM), generally using SAS for reporting purposes.  Other tools, including but not limited to, BEST/1 (acquired by BMC) and PERFMAN (acquired by ASG) also offered capacity planning and performance management solutions.  Therefore, for 20+ years the fundamental Mainframe CPU usage data and associated tools have remained largely the same.  However, maybe the IBM Mainframe server has changed, both in terms of underlying CPU chip technology and customer workload deployment…

I often hear capacity planners state something along the lines of “I can report on the past with 100% accuracy, but predicting the future might prove to be a little more difficult”!  Once again, going back to the early 1990’s, the IBM Mainframe had a typical if not generic workload profile deployment, namely On-Line Transaction Processing (E.g. CICS, IMS DC) and related Database Management Subsystems (E.g. DB2, IMS DB) with Batch Processing.  This somewhat limited workload profile simplified the Capacity Planning process, applying estimates of growth based on current usage.  However, when the Mainframe became more pervasive, taking on new workloads, how was the capacity planner supposed to estimate CPU requirements for their new business application workload?

IBM introduced the Large Systems Performance Reference (LSPR) methodology, designed to provide relative processor capacity data for IBM System/370, System/390 and z/Architecture processors.  All LSPR data is based on a set of measured benchmarks and analysis, covering a variety of System Control Program (SCP) and workload environments.  LSPR data is intended to be used to estimate the capacity expectation for a production workload when considering a move to a new processor.  Although LSPR data is provided on an “as is” basis, with no warranty, it at least provides the Mainframe Capacity Planner with some insight into their CPU sizing challenge.  For many years, LSPR provided the only other data source, as well as RMF (CMF) for Mainframe CPU sizing.  However, is there a more accurate data source, perhaps based on real-life customer data?

With the introduction of the IBM System z10 server (February 2008), a new function CPU MF (CPU Measurement Facility) was incorporated.  Let’s not forget, z10 is now an n-2 technology, having been superseded by the z196/z114 and the latest zBC12/zEC12 generation of servers.  So each and every committed Mainframe customer should be positioned to benefit from the CPU MF function.

CPU MF provides optional hardware assisted collections of information about logical CPU activity executed over a specified interval in selected Logical Partitions (LPARs).  The CPU MF counters function is intended to be run on a constant basis to collect long-term performance data (I.E. SMF Record 113), in a similar manner to how you collect other performance data.  Therefore this data source can be deployed to further refine the accuracy of Mainframe CPU capacity planning projections.  Let’s not forget:

The primary on-going requirement for Mainframe Capacity Planning is to minimize any over or under capacity provision from forecast predictions, used for Mainframe server acquisition purposes”

Mainframe chip technology has also changed in complexity, especially with the latest iterations of CPU chips associated with the z10 server (E.g. POWER 6) onwards, incorporating many layers of cache memory.  Workload capacity performance will be quite sensitive to how deep into the memory hierarchy the processor must go to retrieve the workload’s instructions and data for execution.  Best performance occurs when the instructions and data are found in the cache(s) nearest the processor so that little time is spent waiting prior to execution; as instructions and data must be retrieved from farther out in the hierarchy, the processor spends more time waiting for their arrival.

As workloads are moved between processors with different memory hierarchy designs, performance will vary as the average time to retrieve instructions and data from within the memory hierarchy will vary.  Additionally, once on a processor this component will continue to vary significantly as the location of a workload’s instructions and data within the memory hierarchy is affected by many factors including; locality of reference, IO rate, competition from other resources (E.g. Applications, LPARs, et al), and so on…

The most performance sensitive area of the memory hierarchy is the activity to the memory nest, namely, the distribution of activity to the shared caches and memory.  IBM introduced new terminology, namely Relative Nest Intensity (RNI), indicating the level of activity to this part of the memory hierarchy.  Using data from CPU MF, the RNI of the workload running in an LPAR may be calculated.  The higher the RNI, the deeper into the memory hierarchy the processor must go to retrieve the instructions and data for that workload.

Therefore the Mainframe Capacity Planner does have various data sources available to forecast how an existing or new workload might perform on an upgraded processor (CPC), further refining their CPU capacity requirement forecast.  As always, the final stage in a Mainframe Capacity Planning process is to input the forecast data into the IBM Processor Capacity Reference (zPCR) tool, to determine the exact model and associated resource configuration options for their unique business workload mix.

To summarize, does your Mainframe Capacity Planning process incorporate all of these CPU sizing data sources, in an easy-to-use and cost efficient manner?

Founded by former IBM staffers and capacity planning and performance management industry veterans William Shelden, PhD, and William Hart, PerfTechPro is designed to deliver sophisticated, affordable, easy-to-use solutions for IT management professionals looking for fast, insightful help without high-cost, complex and time-consuming purchasing and licensing requirements.

PerfTechPro for z/OS is a Capacity Planning and Performance Measurement tool specifically designed for the cost conscious and savvy 21st Century data centre.  PerfTechPro for z/OS is the next evolution in Mainframe Capacity Planning tools, having been architected from ground zero using the latest techniques.  PerfTechPro for z/OS provides sophisticated capacity and performance management capabilities, affordable by any sized data centre:

  • Clean, intuitive, easy-to-use interface and graphical representations, for example:
    • Consolidated instance lists guide users to make informed selections
    • Descriptive dialog boxes detail your configuration
    • Anticipates, pre-loads data to speed retrieval, reporting and analysis
    • Automated data management
  • Forecasting and modelling
  • Non-proprietary database, enabling data use outside of PerfTechPro
  • Capable of automated collection, analysis and reporting of SMF 113 records produced by the IBM CPU Measurement Facility (CPU MF)
  • Supports measurement, management of zAAP & zIIP Specialty Engines
  • Automated analysis and management of all key capacity and performance metrics, for example:
    • GPP Utilization of All LPARs
    • MIPS Usage by CPU
    • DASD Response Times
    • Address Spaces Dispatched and Waiting 

PerfTechPro for z/OS also simplifies the data management process associated with Mainframe Capacity Planning.  Using a streamlined process on the z/OS host, PerfTechPro extracts and formats the data required from various SMF sources (E.g. SMF Type 7n, Type 113); delivering an optimized Performance Data Base (PDB) for use by the Windows based GUI.  This optimized file safeguards fast processing during the reporting and forecasting activities, while simplifying any data aggregation processes (E.g. Weekly, Monthly, et al).  Moreover, PerfTechPro allows this data to be stored in non-proprietary (E.g. Microsoft Access, SQL Server, MySQL, Oracle) and multiple database structures, as and if required.

PerfTechPro for z/OS is a simple-to-use and cost-efficient solution, allowing customers to quickly save time and money from their Capacity Planning and Performance Measurement solution.  Ultimately the bottom line objective for PerfTechPro for z/OS is to provide a best-of-breed solution for a very competitive cost. PerfTechPro for z/OS delivers business value by:

  • Ensuring enterprise zSeries Mainframe server resources are being used efficiently
  • Maximizing opportunities for cost-savings
  • Anticipating & responding to increased demand on resources
  • Reducing costs by exploiting periods of lower resource demand
  • Discerning underlying causes of performance and capacity issues
  • Eliminating time-consuming manual tracking, recording and analysis
  • Implementing disciplined management of valuable business resources

In conclusion, the Mainframe Capacity Planning process continues to evolve, forever striving to reduce any discrepancies in CPU requirements forecasting, which of course, have a high associated cost consideration.  Integrating CPU MF (SMF Type 113) must be a mandatory requirement, safeguarding that CPU Sizing, Forecasting, Modelling and Correlation Analysis activities are optimized.  Additionally, the actual process of Mainframe Capacity Planning is an activity that requires great skill and considerable associated responsibility.  A modern day solution such as PerfTechPro for z/OS is worthy of consideration, having been designed by a team with a heritage in delivering Mainframe Capacity Planning solutions, architecting function compatible with modern day functionality, while considering the latest technology zSeries CPU chip design considerations.

Application Performance Tuning – Why Bother?

With older generations of Mainframe Operating Systems, certainly MVS/XA and perhaps MVS/ESA, application performance tuning was a necessity, not an afterthought.  Quite simply, the cost of Mainframe resources, namely CPU, memory and disk, dictated that your mission critical business application might not perform to business requirements, unless you tuned your programming code.  Programmers, both of the system and application variety understood the bits and bytes of available programming languages (E.g. ASM, COBOL, PL/I) and Operating System (I.E. MVS), collaborating either via proactive process, or reactive problem solving.  With the continuing reduction of IT hardware component costs, the improvement in Operating Systems (E.g. 64-bit architecture) and newer programming languages (E.g. C, C++), it seems that application performing tuning is somewhat of an afterthought, but at what cost?

We all know that the cost of a Mainframe MIPS is significant, and although it might have reduced dramatically from a hardware viewpoint, from a software viewpoint, the cost remains largely static at ~£1,500-£3,500, per year, depending on your configuration.  So if your applications are burning several hundred if not several thousand extra MIPS unnecessarily, that’s very expensive indeed!  Additionally and just as importantly, a badly tuned system will manifest itself in slower transaction response times and longer batch jobs, if applicable, which could impact service availability.  So why is there a seeming reluctance to tune business applications, Mainframe resident or not?

If ever there was a functional IT area where the skills gap has never been wider, then application performance tuning is said skill, when comparing the salty old sea dog Mainframe dinosaur, with the newer Mainframe technician!

From an application development process viewpoint, where does the application performance tuning task live; before or after implementation?  The cynical amongst us will know; if it’s after implementation, there’s a strong likelihood said activity will never be performed!  If it’s before implementation, how many projects incorporate a meaningful stress test, or measure transaction response times versus an SLA or KPI metric?  Additionally, if the project is high-priority and/or running behind schedule, then performance testing is an activity that is easily removed…

Back in the good old days, the late 1980’s to early 1990’s, some application performance tuning tools did start to emerge, most notably Strobe.  Strobe was useful to even the most accomplished of system and application programmer personnel, and invaluable to less experienced personnel, and so arguably Strobe became the de facto software tool for tuning Mainframe applications.  However, later releases of MVS (E.g. OS/390 and z/OS), the non-event that was the Year 2000 (Y2K), seemed to remove the focus on and importance of application tuning.

Arguably most importantly of all, that software MIPS cost item, where Strobe and its competitors (E.g. ASG/BMC TriTune, CA Application Tuner, IBM APA, Macro4 ExpeTune, et al) will utilize even more CPU to capture diagnostic trace information, contributed to the demise of application performance tuning.  However, those companies that have undertaken such application tuning activities in the last decade or so are sitting pretty, having reduced the CPU (MIPS) resource consumed, lowering TCO and optimizing performance accordingly.  In the 21st Century, these software solutions are classified as Application Performance Management (APM) solutions.

Is there a better and easier way to stimulate an interest in the application performance tuning discipline?  If the desire exists to tune an application, lowering CPU MIPS usage, optimizing service performance, then the traditional tools and methods mentioned previously exist, but perhaps a new (or not so new) CPU performance data source exists…

With the introduction of the z10 server, a new function CPU MF (CPU Measurement Facility) was incorporated.  Let’s not forget, z10 is now an n-2 technology, having been superseded by the z196/z114 and the latest zBC12/zEC12 generation of servers.  So each and every committed Mainframe customer should be positioned to benefit from the CPU MF function.

CPU MF provides optional hardware assisted collections of information about logical CPU activity executed over a specified interval in selected Logical Partitions (LPARs).  The CPU MF counters function is intended to be run on a constant basis to collect long-term performance data (I.E. SMF Record 113), in a similar manner to how you collect other performance data.  I have previously briefly discussed how CPU MF SMF data can be used to increase Mainframe Server Capacity Planning efficiencies. 

The CPU MF sampling function is a short duration, precise function that identifies where CPU resources are being used, to help you improve application efficiency.  Put very simply, CPU MF sampling data has minimal CPU overhead (E.g. ~0.1-1.0%) when collecting data (I.E. z/OS Hardware Instrumentation Services – HIS), but this data can then be used to identify CPU “hot spots”, which can then be further analysed to identify the “areas of code” generating the high CPU usage.  However, it was forever thus, whether an APM tool, or CPU MF sampling data, high CPU usage can be identified, but the application programmer must undertake the task of optimizing the application code!

IBM have done a great job in providing CPU MF counters data, optimizing the Capacity Planning process with the SMF 113 record, and the realm of possibility exists with the sample data, but a software solution is required to analyse and summarize this data.

Currently there are very few if only one software solution that analyses CPU MF sample data, namely zHISR from Phoenix Software International.  zHISR interfaces directly with z/OS Hardware Instrumentation Services to collect data for hotspot analysis of customer, vendor, or operating system program execution.  zHISR features include:

  • Support for up to 128 simultaneous data collections events.  zHISR collections do not interfere with any HIS functions including sample or counter collection.
  • System console commands for many zHISR functions.
  • An Application Programming Interface to COBOL and Assembler for starting and stopping data collections. Collection lengths for API generated collections have a time range of one second or more.
  • Ability to schedule a collection with JCL so that collection starts when a given job or step begins.
  • Ability to store data collections as z/OS data sets or UNIX files.
  • Support for collections against CICS/TS transactions.
  • Analysis based on a time range within the collected data for a narrower spotlight on problem code.

An intuitive ISPF dialog allows the user to easily produce a CPU hot spots analysis, which can then be used for identifying the offending code sections.  The user can then drill down and highlight the high CPU CSECT and program offset (instruction), comparing with their Associated Data (ADATA), and thus the source programming instruction.  Therefore the skill required to perform analysis is minimal, as is the CPU overhead in collecting analysis data, and so eradicating the potential barriers when embarking on an application tuning initiative.  Furthermore, the actual cost of deploying the zHISR software is not onerous and so perhaps each and every committed Mainframe user can easily include application performance tuning into their application development lifecycle processes. 

zHISR has a UNIX file system interface that lets you navigate the system and browse or delete files.  With zHISR, users can start and stop hardware event data collections and view the status of the current or prior HIS run.  zHISR also includes a memory display/alter utility that lets you view main storage in the CPU you are logged on to.  If zIIPs are present and zHISR is defined as an authorized subsystem, nearly all of the CPU processing used by zHISR is redirected to a zIIP.

There are also instances, however few and far between, where Mainframe customers have written their own proprietary in-house OLTP (On-Line Transaction Processor) and Relational Database Management Subsystem (RDBMS), where traditional APM software tools can’t provide a solution, only interfacing with underlying subsystems (E.g. Adabas, CICS, DB2, IDMS, WebSphere, et al).  In these instances, CPU MF and zHISR offer a solution to help such customers, who probably face challenges when they upgrade their Mainframe servers, safeguarding software and application code is compatible with the new hardware, and ideally, exploits the latest functionality.

In conclusion, application performance tuning has to be a very important if not mandatory activity for the Mainframe Data Centre.  Whether via CPU MF or traditional APM software solutions, the cost reduction and performance improvement benefits of tuning should be compelling reasons to proactively engage in application tuning activities.  From a skills viewpoint, maybe the KISS (Keep It Simple Stupid) principle can apply, where CPU MF collects the data very simply and efficiently, complemented by zHISR, analysing the data in an intuitive and cost optimized manner.

So turning the subject matter on its head, Application Performance Tuning – Why Bother?  Why not!

Further information can be found from my z/OS Application Performance Tuning presentation, delivered at UK GSE in November 2012.

The Problem With Problems – Are You zAware?

Several decades ago and observing potential challenges with hardware, most of us seasoned Mainframe folk would have been familiar with the terms Mean Time Between Failure (MTBF) and Mean Time To Repair (MTTR), although repair might become resolution, replacement, and so on.  As hardware has become more reliable, with very few if any single points of failure, we don’t really use these terms for hardware, but perhaps if we don’t use them for problems associated with our business applications, we should…

Today we generally simplify this area of safeguarding business processing metrics (E.g. SLA, KPI) with the Reliability, Availability and Serviceability (RAS) terminology.  So whether hardware related by an IHV such as IBM, or software related by ISV’s such as ASG, BMC, CA, IBM, naming but a few, or application code writers, we’re all striving to improve the RAS metrics associated with our IT discipline or component.

There will always be the ubiquitous software bugs, human error when making configuration changes, and so on, but what about those scenarios we might not even consider to be a problem, yet they can have a significant impact on our business?  An end-to-end application transaction could consist of an On-Line Transaction Processor (E.g. OLTP, CICS, IMS, et al), a Relational Database Management Subsystem (E.g. RDBMS, DB2, ADABAS, IDMS, et al), a Messaging Broker (E.g. WebSphere MQ), a Networking Protocol (E.g. TCP/IP, SNA, et al), with all of the associated application infrastructure (E.g. Storage, Operating System, Server, Application Programs, Security, et al); so when we experience a “transaction failure”, which might be performance related, which component failed or caused the incident?

Systems Management disciplines dictate Mainframe Data Centres deploy a plethora of monitors (E.g. ASG-TMON, BMC MAINVIEW, CA SYSVIEW, IBM Tivoli OMEGAMON, et al), but these software solutions typically generate a significant amount of data, but what we really need for successful problem solving is the right amount of meaningful information.

So ask yourself the rhetorical question.  You know it; how many application performance issues remain unsolved, because we just can’t identify which component caused the issue, or there is just too much data (E.g. System Monitor Logs) to analyse?  If you’re being honest, I guess the answer is greater than zero, perhaps significantly greater.  Further complications can occur, because of the collaboration required to resolve such issues, as each discipline, Transaction, Databases, Messaging, Networking, Security, General Systems Management, Performance Monitoring, typically reside in different teams…

IBM System z Advanced Workload Analysis Reporter (IBM zAware) is an integrated, self-learning, analytics solution for IBM z/OS that helps identify unusual system behaviour in near real time.  It is designed to help IT personnel improve problem determination so they can restore service quickly and improve overall availability.  zAware integrates with the family of IBM Mainframe System Management tools, including Runtime Diagnostics, Predictive Failure Analysis (PFA), IBM Health Checker for z/OS and z/OS Management Facility (z/OSMF).

IBM zAware runs in an LPAR on a zEC12 or later CPC.  Just like any System z LPAR, IBM zAware requires processing capacity, memory, disk storage, and connectivity.  IBM zAware is able to use either general purpose CPs or IFLs, which can be shared or dedicated.  It is generally more cost effective to deploy zAware on an IFL.

Used together with other Mainframe System Management Tools, zAware provides another view of your system(S) behaviour, helping answer questions such as:

  • Are my systems showing abnormal message activity?
  • When did this abnormal message activity start?
  • Is this abnormal message activity repetitive?
  • Are there messages appearing that have never appeared before?
  • Do the times of abnormal message activity coincide with problems in the system?
  • Is the abnormal behaviour limited to one system or are multiple systems involved?

IBM zAware creates a model of the normal operating characteristics of a z/OS system using message data captured from OPERLOG.  This message data includes any well-formed message captured by OPERLOG (I.E. A message with a tangible Message ID), whether it is from an IBM product, a non-IBM product, or one of your own application programs.  This model of past system behaviour is used as the base against which to compare message patterns that are occurring now.  The results of this comparison might help answer these questions.

IBM zAware determines, using its model of each system, what messages are new or if messages have been issued out of context based on the past normal behaviour of the system.  The model contains patterns of message ID occurrence over a previous period and does not need to know what job or started task issued the message. It also does not need to use the text of a message.

In summary, zAware is a self-learning technology, for newer zSeries Servers (I.E. zEC12 onwards), which can help reduce the time to identify the “area” of where a problem occurred, or is occurring (E.g. Near Real-Time), allowing a technician to fully identify the problem diagnosis and consider potential resolutions.  Put very simply, zAware will assist in identifying the problem, but it does not fully qualify the problem and associated resolution.  This is a good quality, as ultimately the human technician must complete this most important of activities!

So what if you’re not a zEC12 user or you’re concerned about increased costs because you don’t deploy IFL speciality engines?

ConicIT/MF is a Proactive Performance Management for First Fault Performance Problem Resolution solution.  By interfacing with standard system monitors (E.g. ASG-TMON, BMC MAINVIEW, CA SYSVIEW, IBM Tivoli OMEGAMON), ConicIT/MF uses sophisticated mathematic models to perform proactive, intelligent and significant data reduction, quickly highlighting possible causes of problems, allowing for efficient problem determination.  Put another way, Systems Management Performance Monitors provide a wealth of data, but sometimes there’s too much data and not enough information.  ConicIT safeguards that the value of the data provided by Systems Management Performance Monitors is analyzed and consolidated to expedite performance problem resolution.

ConicIT runs on a distributed Linux system external to the Mainframe system being monitored.  ConicIT is a completely agentless architecture which doesn’t require installation on the Mainframe system being monitored.  It receives data from existing monitors (E.g. ASG-TMON, BMC MAINVIEW, CA SYSVIEW, IBM Tivoli OMEGAMON, et al), through their standard interfaces.  3270 emulation enables ConicIT to appear as just another operator to the existing monitor and adds no more load to the monitored system than would adding an additional human operator.

Until a problem is predicted ConicIT requests basic monitor information at a very low frequency (about once per minute), but if the ConicIT analysis senses a performance problem brewing, its requests for information increase, but never so much as to effect the monitored system.  The maximum load generated by ConicIT is configurable and ConicIT supports all the major Mainframe monitors.

The monitor data stream is retrieved by parsing the data from the various (E.g. Log) data sources.  This raw data is first sent to the ConicIT data cleansing component.  Data from existing monitors is very “noisy”, since various system parameters values can fluctuate widely even when the system is running perfectly.  The job of the data cleansing algorithm is to find meaningful features from the fluctuating data.  Without an appropriate data cleansing algorithm it is very difficult or impossible for any useful analysis to take place. Such cleansing is a simple visual task for a trained operator, but is very tricky for an automated algorithm.

The relevant features found by the data cleansing algorithm are then processed to create appropriate variables.  These variables are created by a set of rules that can process the data and apply transformations to the data (E.g. combine single data points into a new synthesized variable, aggregate data points) to better describe the relevant state of the system.

These processed variables are analyzed by models that are used to discover anomalies that could be indicative of a brewing performance problem.  Each model looks for a specific type of statistical anomalies that could predict a performance problem.  No single model is appropriate for a system as complex as a large computing system, especially since the workload profile changes over time.  So rather than a single model, ConicIT generates models appropriate to the historical data from a large, predefined set of prediction algorithms.  This set of active models is used to analyze the data, detect anomalies and predict performance degradation.  The active models vote on the possibility of an upcoming problem in order to make sure that as wide a set of anomalies as possible are covered, while lowering the number of false alerts.  The set of active models change over time based on the results of an offline learning algorithm which can either generate new models based on the data, or change the weighting of existing models.  The learning algorithm is run in the background on a periodic basis.

When a possible performance problem is predicted by the active models, the ConicIT system takes two actions.  It sends an alert to the appropriate consoles and systems, and also instructs the monitor to collect information from the effected systems more frequently.  The result is that when IT personnel analyze the problem they have the information describing the state of the system and the effected system components as if they were watching the problem while it was happening.  The system also uses the information from the analysis to point out the anomalies that led the system to predict a problem, thereby aiding in root cause analysis of the problem.

So whether zAware or ConicIT, there are solutions to assist today’s busy IT technician to improve the Reliability, Availability and Serviceability (RAS) metric for their business, by implementing practicable resolutions for those problems, which previously, were just too problematic to solve.  zAware can offload its processing to an IFL, as and if available, whereas ConicIT performs its processing on a Non-Mainframe platform, and thus can support all zSeries Servers, not just the zEC12 platform.

Ultimately both the zAware and ConicIT solutions have the same objective, increasing Mean Time Between Failure (MTBF) and decreasing Mean Time To Resolution (MTTR), optimizing IT personnel time accordingly.