The Ever Changing IBM Z Mainframe Disaster Recovery Requirement

With a 50+ year longevity, of course the IBM Z Mainframe Disaster Recovery (DR) requirement and associated processes have changed and evolved accordingly.  Initially, the primary focus would have been HDA (Head Disk Assembly) related, recovering data due to hardware (E.g. 23nn, 33nn DASD) failures.  It seems incredulous in the 21st Century to consider the downtime and data loss with such an event, but these failures were commonplace into the early 1980’s.  Disk drive (DASD) reliability increased with the 3380 device in the 1980’s and the introduction of the 3990-03 Dual Copy capability in the late 1980’s eradicated the potential consequences of a physical HDA failure.

The significant cost of storage and CPU resources dictated that many organizations had to rely upon 3rd party service providers for DR resource provision.  Often this dictated a classification of business applications, differentiating between Mission Critical or not, where DR backup and recovery processes would be application based.  Even the largest of organizations that could afford to duplicate CPU resource, would have to rely upon the Ford Transit Access Method (FTAM), shipping physical tape from one location to another and performing proactive or more likely reactive data restore activities.  A modicum of database log-shipping over SNA networks automated this process for Mission Critical data, but successful DR provision was still a major consideration.

Even with the Dual Copy function, this meant DASD storage resources had to be doubled for contingency purposes.  Therefore this dictated only the upper echelons of the business world (I.E. Financial Organizations, Telecommunications Suppliers, Airlines, Etc.) could afford the duplication of investment required for self-sufficient DR capability.  Put simply, a duplication of IBM Mainframe CPU, Network and Storage resources was required…

The 1990’s heralded a significant evolution in generic IT technology, including IBM Mainframe.  The adoption of RAID technology for IBM Mainframe Count Key Data (CKD) provided an affordable solution for all IBM Mainframe users, where RAID-5(+) implementations became commonplace.  The emergence of ESCON/FICON channel connectivity provided the extended distance requirement to complement the emerging Parallel SYSPLEX technology, allowing IBM Mainframe servers and related storage to be geographically dispersed.  This allowed a greater number of IBM Mainframe customers to provision their own in-house DR capability, but many still relied upon physical tape shipment to a 3rd party DR services provider.

The final significant storage technology evolution was the Virtual Tape Library (VTL) structure, introduced in the mid-1990’s.  This technology simplified capacity optimization for physical tape media, while reducing the number of physical drives required to satisfy the tape workload.  These VTL structures would also benefit from SYSPLEX implementations, but for many IBM Mainframe users, physical tape shipment might still be required.  Even though the IBM Mainframe had supported IP connectivity since the early 1990’s, using this network capability to ship significant amounts of data was dependent upon public network infrastructures becoming faster and more affordable.  In the mid-2000’s, transporting IBM Mainframe backup data via extended network carriers, beyond the limit of FICON technologies became more commonplace, once again, changing the face of DR approaches.

More recently, the need for Grid configurations of 2, 3 or more locations has become the utopia for the Global 1000 type business organization.  Numerous copies of synchronized Mission Critical if not all IBM Z Mainframe data are now maintained, reducing the Recovery Time Objective (RTO) and Recovery Point Objective (RPO) DR criteria to several Minutes or less.

As with anything in life, learning from the lessons of history is always a good thing and for each and every high profile IBM Z Mainframe user (E.g. 5000+ MSU), there are many more smaller users, who face the same DR challenges.  Just as various technology races (E.g. Space, Motor Sport, Energy, et al) eventually deliver affordable benefit to a wider population, the same applies for the IBM Z Mainframe community.  The commonality is the challenges faced, where over the years, DR focus has either been application or entire business based, influenced by the technologies available to the IBM Mainframe user, typically dictated by cost.  However, the recent digital data explosion generates a common challenge for all IT users alike, whether large or small.  Quite simply, to remain competitive and generate new business opportunities from that priceless and unique resource, namely business data, organizations must embrace the DevOps philosophy.

Let’s consider the frequency of performing DR tests.  If you’re a smaller IBM Z Mainframe user, relying upon a 3rd party DR service provider, your DR test frequency might be 1-2 tests per year.  Conversely if you’re a large IBM z Mainframe user, deploying a Grid configuration, you might consider that your business no longer has the requirement for periodic DR tests?  This would be a dangerous thought pattern, because it was forever thus, SYSPLEX and Grid configurations only safeguard from physical hardware scenarios, whereas a logical error will proliferate throughout all data copies, whether, 2, 3 or more…

Similarly, when considering the frequency of Business Application changes, for the archetypal IBM Z Mainframe user, this might have been Monthly or Quarterly, perhaps with imposed change freezes due to significant seasonal or business peaks.  However, in an IT ecosystem where the IBM Z Mainframe is just another interconnected node on the network, the requirement for a significantly increased frequency of Business Application changes arguably becomes mandatory.  Therefore, once again, if we consider our frequency of DR tests, how many per year do we perform?  In all likelihood, this becomes the wrong question!  A better statement might be, “we perform an automated DR test as part of our Business Application changes”.  In theory, the adoption of DevOps either increases the frequency of scheduled Business Application changes, or organization embraces an “on demand” type approach…

We must then consider which IT Group performs the DR test?  In theory, it’s many groups, dictated by their technical expertise, whether Server, Storage, Network, Database, Transaction or Operations based.  Once again, if embracing DevOps, the Application Development teams need to be able to write and test code, while the Operations teams need to implement and manage the associated business services.  In such a model, there has to be a fundamental mind change, where technical Subject Matter Experts (SME) design and implement technical processes, which simplify the activities associated with DevOps.  From a DR viewpoint, this dictates that the DevOps process should facilitate a robust DR test, for each and every Business Application change.  Whether an organization is the largest or smallest of IBM Z Mainframe user is somewhat arbitrary, performing an entire system-wide DR test for an isolated Business Application change is not required.  Conversely, performing a meaningful Business Application test during the DevOps code test and acceptance process makes perfect sense.

Performing a meaningful Business Application DR test as part of the DevOps process is a consistent requirement, whether an organization is the largest or smallest IBM Z Mainframe user.  Although their hardware resource might differ significantly, where the largest IBM Z Mainframe user would typically deploy a high-end VTL (I.E. IBM TS77n0, EMC DLm 8n00, Oracle VSM, et al), the requirement to perform a seamless, agile and timely Business Application DR test remains the same.

If we recognize that the IBM Z Mainframe is typically deployed as the System Of Record (SOR) data server, today’s 21st century Business Application incorporates interoperability with Distributed Systems (E.g. Wintel, UNIX, Linux, et al) platforms.  In theory, this is a consideration, as mostly, IBM Z Mainframe data resides in proprietary 3390 DASD subsystems, while Distributed Systems data typically resides in IP (NFS, NAS) and/or FC (SAN) filesystems.  However, the IBM Z Mainframe has leveraged from Distributed Systems technology advancements, where typical VTL Grid configurations utilize proprietary IP connected disk arrays for VTL data.  Ultimately a VTL structure will contain the “just in case” copy of Business Application backup data, the very data copy required for a meaningful DR test.  Wouldn’t it be advantageous if the IBM Z Mainframe backup resided on the same IP or FC Disk Array as Distributed Systems backups?

Ultimately the high-end VTL (I.E. IBM TS77n0, EMC DLm 8n00, Oracle VSM, et al) solutions are designed for the upper echelons of the business and IBM Z Mainframe world.  Their capacity, performance and resilience capability is significant, and by definition, so is the associated cost.  How easy or difficult might it be to perform a seamless, agile and timely Business Application DR test via such a high-end VTL?  Are there alternative options that any IBM Z Mainframe user can consider, regardless of their size, whether large or small?

The advances in FICON connectivity, x86/POWER servers and Distributed Systems disk arrays has allowed for such technologies to be packaged in a cost efficient and small footprint IBM Z VTL appliance.  Their ability to connect to the IBM Z server via FICON connectivity, provide full IBM Z tape emulation and connect to ubiquitous IP and FC Distributed Systems disk arrays, positions them for strategic use by any IBM Z Mainframe user for DevOps DR testing.  Primarily one consistent copy of enterprise wide Business Application data would reside on the same disk array, simplifying the process of recovering Point-In-Time backup data for DR testing.

On the one hand, for the smaller IBM Z user, such an IBM Z VTL appliance (E.g. Optica zVT) could for the first time, allow them to simplify their DR processes with a 3rd party DR supplier.  They could electronically vault their IBM Z Mainframe backup data to their 3rd party DR supplier and activate a totally automated DR invocation, as and when required.  On the other hand, moreover for DevOps processes, the provision of an isolated LPAR, would allow the smaller IBM Z Mainframe user to perform a meaningful Business Application DR test, in-house, without impacting Production services.  Once again, simplifying the Business Application DR test process applies to the largest of IBM Z Mainframe users, and leveraging from such an IBM Z VTL appliance, would simplify things, without impacting their Grid configuration supporting their Mission critical workloads.

In conclusion, there has always been commonality in DR processes for the smallest and largest of IBM Z Mainframe users, where the only tangible difference would have been budget related, where the largest IBM Z Mainframe user could and in fact needed to invest in the latest and greatest.  As always, sometimes there are requirements that apply to all, regardless of size and budget.  Seemingly DevOps is such a requirement, and the need to perform on-demand seamless, agile and timely Business Application DR tests is mandatory for all.  From an enterprise wide viewpoint, perhaps a modicum of investment in an affordable IBM Z VTL appliance might be the last time an IBM Z Mainframe user needs to revisit their DR testing processes!

System z: I/O Interoperability Evolution – From Bus & Tag to FICON

Since the introduction of the S/360 Mainframe in 1964 there has been a gradual evolution of I/O connectivity that has taken us from copper Bus & Tag to fibre ESCON and now FICON channels.  Obviously during this ~50 year period there have been exponentially more releases of Mainframe server and indeed Operating System.  In this timeframe there have been 2 significant I/O technology milestones.  Firstly, in 1990, ESCON was part of the significant S/390 announcement (MVS/ESA), where migration to ESCON was a great benefit, if only for replacing the heavy and big copper Bus & Tag channels.  Secondly, even though FICON was released in the late 1990’s, in 2009 IBM announced that the z10 would be the last Mainframe server to support greater than 240 native ESCON channels.  Similarly IBM declared that the last zEnterprise server to support ESCON channels are the z196 and z114 servers.  Each of these major I/O evolutions required a migration philosophy and not every I/O device would be upgraded to support either native ESCON of FICON channels.  How did customers achieve these mandatory I/O upgrades to safeguard IBM Mainframe Server and associated Operating System longevity?

In 2009 it was estimated ~20% of all Mainframe customers were using ESCON only I/O infrastructures, while only ~20% of all Mainframe customers were deploying a FICON only infrastructure.  Similarly ~33% of z9 and z10 systems were shipped with ESCON CVC (Block Multiplexor) and CBY (Byte Multiplexor) channels defined, while ~75% of all Mainframe Servers had native ESCON (CNC) capability.  From a dispassionate viewpoint, clearly the migration from ESCON to FICON was going to be a significant challenge, while even in this timeframe, there was still use of Bus & Tag channels…

One of the major strengths of the IBM Mainframe ecosystem is the partner network, primarily software (ISV) based, but with some significant hardware (IHV) providers.  From a channel switch viewpoint, we will all be familiar with Brocade, Cisco and McData, where Brocade acquired McData in 2006.  However, from a channel protocol conversion viewpoint, IBM worked with Optica Technologies, to deliver a solution that would allow the support for ESCON and Bus & Tag channels to the FICON only zBC12/zEC12 and future Mainframe servers (I.E. z13, z13s).  Somewhat analogous to the smartphone where the user doesn’t necessarily know that an ARM processor might be delivering CPU power to their phone, sometimes even seasoned Mainframe professionals might inadvertently overlook that the Optica Technologies Prizm solution has been or indeed is still deployed in their System z Data Centre…

When IBM work with a partner from an I/O connectivity viewpoint, clearly IBM have to safeguard that said connectivity has the highest interoperability capability with bulletproof data exchange attributes.  Sometimes we might take this for granted with the ubiquitous disk and tape subsystem suppliers (I.E. EMC, HDS, IBM, Oracle), but for FICON conversion support, Optica Technologies was a collaborative partner for IBM.  Ultimately the IBM Hardware Systems Assurance labs deploy their proprietary System Assurance Kernel (SAK) processes to safeguard I/O subsystem interoperability for their System z Mainframe servers.  Asking that rhetorical question; when was the last time you asked your IHV for site of their System Assurance Kernel (SAK) exit report from their collaboration with IBM Hardware Systems Assurance labs for their I/O subsystem you’re considering or deploying?  In conclusion, the SAK compliant, elegant, simple and competitively priced Prizm solution allowed the migration of tens if not hundreds of thousands of ESCON connections in thousands of Mainframe data centres globally!

With such a rich heritage of providing a valuable solution to the global IBM Mainframe install base, whether the smallest or largest, what would be next for Optica Technologies?  Obviously leveraging from their expertise in FICON channel support would be a good way forward.  With the recent acquisition of Bus-Tech by EMC and the eradication of the flexible MDL tapeless virtual tape offering, Optica Technologies are ideally placed to be that small, passionate and eminently qualified IHV to deliver a turnkey virtual tape solution for the smaller and indeed larger System z user.  The Optica Technologies zVT family leverages from the robust and heritage class Prizm technology, delivering an innovative family of virtual tape solutions.  The entry “Virtual Tape In A Box” zVT 3000i provides 2 FICON channel interfaces and 4 TB uncompressed internal RAID-5 disk space, seamlessly interfacing with all System z supported tape devices (I.E. 3490, 3590) and processes.  A single enterprise class zVT 5000-iNAS node delivers 2 FICON channel interfaces, NFS storage capacity from 8TB to 1PB in a single frame with standard deduplication, compression, replication and encryption features.  The zVT 5000-iNAS is available with multi-node configuration support for additional scalability and resiliency.  For those customers wishing to deploy their own choice of NFS or FC storage subsystem, the zVT 5000-FLEX allows such connectivity accordingly.

In conclusion, sometimes it’s all too easy to take some solutions for granted, when they actually delivered a tangible and arguably priceless solution in the evolution of your organizations System z Mainframe server journey from ESCON, if not Bus & Tag to FICON.  Perhaps the Prizm solution is one of these unsung products?  Therefore, the next time you’re reviewing the virtual tape market place, why wouldn’t you seriously consider Optica Technologies, given their rich heritage in FICON channel interoperability?  Given that IBM chose Optica Technologies as their strategic partner for ESCON to FICON migration, seemingly even IBM might have thought “nobody gets fired for choosing…”!