Enabling IBM Z Security For The Cloud: Meltdown & Spectre Observations

The New Year period of 2018 delivered unpleasant news for the majority of IT users deploying Intel chips for their Mission Critical workloads.  Intel chips manufactured since 1995 have been identified as having a security flaw or bug.  This kernel level bug has been identified as leaking memory, allowing hackers access to read sensitive data, including passwords, login keys, et al, from the chip itself.  It therefore follows, this vulnerability allows malware inserts.  Let’s not overlook that x86 chips don’t just reside in PCs, their use is ubiquitous, including servers, the cloud and even mobile devices and the bug impacts all associated operating systems, Windows, Linux, macOS, et al.  Obviously, kernel access just bypasses everything security related…

From a classification viewpoint, Meltdown is a hardware vulnerability affecting a plethora of Intel x86 microprocessors, ten or so IBM POWER processors, and some ARM and Apple based microprocessors, allowing a rogue process to read all memory, even when not authorized.  Spectre breaks the isolation between different applications, allowing attackers to trick error free programs, which actually follow best practices, into leaking sensitive data and is more pervasive encompassing nearly all chip manufacturers.

There have been a number of software patches issued, firstly in late January 2018, which inevitably caused other usability issues and the patch reliability has become more stable during the last three-month period.  Intel now claim to have redesigned their upcoming 8th Generation Xeon and Core processors to further reduce the risks of attacks via the Spectre and Meltdown vulnerabilities.  Of course, these patches, whether at the software or firmware level are impacting chip performance, and as always, the figures vary greatly, but anything from 10-25% seems in the ball-park, with obvious consequences!

From a big picture viewpoint, if a technology is pervasive, it’s a prime target for the hacker community.  Windows being the traditional easy target, but an even better target is the CPU chip itself, encompassing all associated Operating Systems.  If you never had any security concerns from a public cloud viewpoint, arguably that was a questionable attitude, but now these rapidly growing public cloud providers really need to up their game from an infrastructure (IaaS) provision viewpoint.  What other chip technologies exist that haven’t been impacted (to date), by these Meltdown and Spectre vulnerabilities; IBM Z, perhaps not?

On 20 March 2018 at Think 2018 IBM announced the first cloud services with Mainframe class data protection:

  • IBM Cloud Hyper Protect Crypto Services: deliver FIPS 140-2 Level 4 security, the highest security level attainable for cryptographic hardware. This level of security is required by the most demanding of industries, for example Financial Services, for data protection.  Physical security mechanisms provide a complete envelope of protection around the cryptographic module with the intent of detecting and responding to all unauthorized attempts at physical access.  Hyper Protect Crypto Services deliver these highest levels of data protection from IBM Z to IBM Cloud.  Hyper Protect Crypto Services secures your data in a Secure Service Container (SSC), providing the enterprise-level of security and impregnability that enterprise customers have come to expect from IBM Z technology.  Hardware virtualisation protects data in an isolated environment.  SSC safeguards no external data access, including privileged users, for example, cloud administrators.  Data is encrypted at rest, in process and in flight.  The available support for Hardware Security Modules (zHSM) allows for digital keys to be protected in accordance with industry regulations.  The zHSM provides safe and secure PKCS#11 APIs, which makes Hyper Protect Crypto Services accessible by popular programming languages (E.g. Java, JavaScript, Swift, et al).
  • IBM Cloud Hyper Protect Containers: enable enterprises to deploy container-based applications and microservices, supported through the IBM Cloud Container service, managing sensitive data with a security-rich Service Container Systems environment via the IBM Z LinuxONE platform. This environment is built with IBM LinuxONE Systems, designed for EAL5+ isolation and Secure Services Containers technology designed to prevent privileged access from malicious users and Cloud Admins.

From an IBM and indeed industry viewpoint, security concerns should not be a barrier for enterprises looking to leverage from cloud native architecture to transform their business and drive new revenue from data using higher-value services including Artificial Intelligence (AI), Internet of Things (IoT) and blockchain.  Hyper Protect Crypto Services is the cryptography function used by the that IBM blockchain platform.  The Hyper Protect Crypto Services – Lite Plan offers free experimental usage of up to 10 crypto slots and is only deleted after 30 days of inactivity.

In a rapidly changing landscape, where AI, Blockchain and IoT are driving rapid cloud adoption, the ever-increasing cybersecurity threat is a clear and present danger.  The manifestation of security vulnerabilities in the processor chip, whether Apple, AMD, Arm, IBM, Intel, Qualcomm, et al, has been yet another wake-up alert and call for action for all.  Even from an IBM Z ecosystem viewpoint, there were Meltdown and Spectre patches required, and one must draw one’s own conclusions as to the pervasive nature of these exposures.

By enabling FIPS 140-2 Level 4 security via Cloud Hyper Protect Crypto Services and EAL5+ isolation via Cloud Hyper Protect Containers IBM Z LinuxONE, if only on the IBM Cloud platform, IBM are offering the highest levels of security accreditation to the wider IT community.  Noting that it was the Google Project Zero team that identified the Meltdown and Spectre vulnerability threats, hopefully Google might consider integrating these IBM Z Enterprise Class security features in their Public Cloud offering?  It therefore follows that all major Public Cloud providers including Amazon, Microsoft, Alibaba, Rackspace, et al, might follow suit?

In conclusion, perhaps the greatest lesson learned from the Meltdown and Spectre issue is that all major CPU chips were impacted and in a rapidly moving landscape of ever increasing public cloud adoption, the need for Enterprise Class security has never been more evident.  A dispassionate viewpoint might agree that IBM Z delivers Enterprise Class security and for the benefit of all evolving businesses, considering wider and arguably ground-breaking collaboration with technologies such as blockchain, wouldn’t it be beneficial if the generic Public Cloud offerings incorporated IBM Z security technology…

Maximizing IBM Z System Of Record (SOR) Data Value: Is ETL Still Relevant?

A generic consensus for the IBM Z Mainframe platform is that it’s the best transaction and database server available, and more recently with the advent of Pervasive Encryption, the best enterprise class security server.  It therefore follows that the majority of mission critical and valuable data resides in IBM Z Mainframe System Of Record (SOR) database repositories, receiving and passing data via real-time transaction services.  Traditionally, maximizing data value generally involved moving data from the IBM Mainframe to another platform, for subsequent analysis, typically for Business Intelligence (BI) and Data Warehouse (DW) purposes.

ETL (Extract, Transform, Load) is an automated and bulk data movement process, transitioning data from source systems via a transformation engine for use by target business decision driven applications, via an installation defined policy, loading the transformed data into target systems, typically data warehouses or specialized data repositories.  Quite simply, ETL enables an organization to make informed and hopefully intelligent data driven business decisions.  This ubiquitous IT industry TLA (Three Letter Acronym) generated a massive industry of ETL solutions, involving specialized software solutions, involving various Distributed Systems hardware platforms, both commodity and specialized.  However, some ~30 years since the first evolution of ETL processes, is ETL still relevant in the 21st Century?

The 21st Century has witnessed a massive and arguably exponential data explosion, from cloud, mobile and social media sources.  These dynamic and open data sources demand intelligent analytics to process the data in near real-time and the notion of having a time delay between the Extract and Load part of the ETL process is becoming increasingly unacceptable for most data driven organizations.  During the last several years, there has been increased usage of Cloud BI, with a reported increase from ~25-80% of public cloud users, deploying Cloud BI solutions.

For cloud resident data warehouses, an evolution from ETL to ELT (Extract, Load, Transform) has taken place.  ELT is an evolutionary and savvy method for of moving data from source systems to centralized data repositories without transforming the data before it’s loaded into the target systems.  The major benefit of the ELT approach is the near real-time processing requirement of today’s data driven 21st Century business.  With ELT, all extracted raw data resides in the data warehouse, where powerful and modern analytical architectures can transform the data, as per the associated business decision making policies.  Put simply, the data transformation occurs when the associated analytical query activities are processed.  For those modern organizations leveraging from public cloud resources, ELT and Cloud BI processes make sense and the growth of Cloud BI speaks for itself.  However, what about the traditional business, which has leveraged from the IBM Z Mainframe platform for 30-50+ years?

Each and every leading Public Cloud supplier, including IBM (Watson) has their own proprietary analytical engine, integrating that technology into their mainstream offerings.  As always, the IBM Z Mainframe platform has evolved to deliver the near real-time requirements of an ELT framework, but are there any other generic solutions that might assist any Mainframe organization in their ETL to ELT evolution process?

B.O.S. Software Service und Vertrieb GmbH offer their tcVISION solution, which approaches this subject matter from a data synchronization viewpoint.  tcVISION is a powerful Change Data Capture (CDC) platform for users of IBM Mainframes and Distributed Systems servers.  tcVISION automatically identifies the changes applied to Mainframe and Distributed Systems databases and files.  No programming effort is necessary to obtain the changed data.  tcVISION continuously propagates the changed data to the target systems in real-time or on a policy driven time interval period, as and when required.  tcVISION offers a rich set of processing and controlling mechanisms to guarantee a data exchange implementation that is fully audit proof.  tcVISION contains powerful bulk processors that perform the initial load of mass data or the cyclic exchange of larger data volumes in an efficient, fast and reliable way.

tcVISION supports several data capture methods that can be individually used as the application and associated data processing flow requires.  These methods are based upon a Real-Time or near Real-Time basis, including IBM Mainframe DBMS, Logstream, Log and Snapshot (compare) data sources.  A myriad of generic database repositories are supported:

  • Adabas: Realtime/Near Realtime, Log Processing, Compare Processing
  • Adabas LUW: Real-time/Near Real-time, log processing, compare processing
  • CA-Datacom: Log processing, compare processing
  • CA-IDMS: Real-time/Near real-time, log processing, compare processing
  • DB2: Real-time/Near real-time, log processing, compare processing
  • DB2/LUW: Real-time/Near real-time, log processing, compare processing
  • Exasol: Compare processing
  • IMS: Real-time/Near real-time, log processing, compare processing
  • Informix: Real-time/Near real-time, log processing, compare processing
  • Microsoft SQL Server: Real-time/Near real-time, log processing, compare processing
  • Oracle: Real-time/Near real-time, log processing, compare processing
  • PostgreSQL: Real-time/Near real-time, log processing, compare processing
  • Sequential file: Compare processing
  • Teradata: Compare processing
  • VSAM: Real-time/Near real-time, log processing, compare processing
  • VSAM/CICS: Real-time/Near real-time, log processing, compare processing

tcVISION incorporates an intelligent bulk load component that can be used to unload data from a Mainframe or Distributed Systems data source, loading the data into a target database, either directly or by using a loader file.  tcVISION comes with an integrated loop-back prevention for bidirectional data exchange, where individual criteria can be specified to detect and ignore changes that have already been applied.  tcVISION incorporates comprehensive monitoring, logging and integrated alert notification.  Optional performance data may be captured and stored into any commercially available relational database.  This performance data can be analyzed and graphically displayed using the tcVISION web component.

From an ETL to ELT evolution viewpoint, tcVISION delivers the following data synchronization benefits:

  • Time Optimization: Significant reduction in data exchange implementation processes and data synchronization processing.
  • Heterogenous Support: Independent of database supplier, offering support for a myriad of source and target databases.
  • Resource Optimization: Mainframe MIPS reduction and data transfer optimization via intelligent secure compression algorithms.
  • Data Availability: Real-time data replication across application and system boundaries.
  • Implementation Simplicity: Eradication of application programming and data engineer resources.
  • Security: Full accountability and auditability all data movements.

In conclusion, the ETL process has now been superseded by the real-time data exchange requirement for 21st Century data processing via the ELT evolution.  Whether viewed as an ELT or data synchronization requirement, tcVISION delivers an independent vendor agnostic solution, which can efficiently deliver seamless data delivery for analytical purposes, while maintaining synchronized data copies between environments in real-time.

The Open Systems Adapter (OSA): Delivering ~25 Years IBM Mainframe IP Connectivity

Recently in my day-to-day activities I encountered a 3172 controller and was reminded of my first such encounter, back in 1992.  This got me thinking; 25 years of IBM Mainframe IP connectivity!  The IBM 3172 Interconnect Controller allowed LAN-to-Mainframe interconnection and was the pioneering technology allowing IP data off-load activities.  Historically Mainframe data transfer operations, namely CCW I/O were dependant on a physical channel, where the 3172 was a stepping stone to the Open Systems Adapter (OSA) card in 1994, quickly superseded by the OSA-2 card in 1995.  From a performance viewpoint, the OSA/OSA-2 cards matched maximum ESCON speeds of 17 MB/S.

However, the introduction of the OSA-Express technology in 1999 dramatically increased throughput to ~ 333 MB/S.  The OSA-Express technology bypasses CCW channel-based I/O processing, connecting directly to the Self-Timed Inter-connect (STI) bus of Generation 6 (Retrofit to Generation 5) S/390 Mainframes.  Data is transferred directly to or from the high speed Mainframe memory OSA-Express adapter I/O port, across the STI bus, with no intervening components or processing to slow down the data interchange.  This bus-based I/O, a first for IBM Mainframe computing, significantly increases data transfer speeds, eliminating inefficiencies associated with intermediary components.

Additionally, IBM developed a totally new I/O scheme for the OSA-Express adapter. Queued Direct I/O (QDIO) is a highly optimized data queuing-based data interchange mechanism, leveraging from the message queuing expertise IBM acquired with their multi-platform MQSeries middleware solution.  The QDIO-specific S/390 hardware instruction for G5/G6 machines, delivered an application to-OSA signalling scheme capable of handling the high-volume, multimedia data transfer requirements of 21st Century web applications.  Where might we be without the 3172 Interconnect Controller and the MQSeries messaging solution?

Since OSA-Express2 the channel types supported have largely remain unchanged:

  • OSD: Queued Direct I/O (QDIO), a highly efficient data transfer architecture, dramatically improving IP data transfer speed and efficiency.
  • OSE: Non-QDIO, sets the OSA-Express card to function in non-QDIO mode, bypassing all of the advanced QDIO functions.
  • OSC: OSA-ICC, available with IBM Mainframes supporting GbE, eliminating the requirement for an external console controller, simplifying HMC and to the z/OS system console access, while introducing TN3270E connectivity.
  • OSN: OSA for NCP, Open Systems Adapter for NCP, eradicates 3745/3746 Front End Processor Network Control Program (NCP) running under IBM Communication Controller for Linux (CCL) requirements.  Superseded by:
  • OSM: (OSA-Express for zManager), provides Intranode Management Network (INMN) connectivity from System z to zManager functions.
  • OSX: (OSA-Express for zBX), provides connectivity and access control to the IntraEnsemble Data Network (IEDN) to the Unified Resource Manager (URM) function.

Returning to my original observation, it’s sometimes hard to reconcile finding a ~25 year old 3172 Controller in a Data Centre environment, preparing for a z14 upgrade!  In conjunction with the z14 announcement, OSA-Express6S promised an Ethernet technology refresh for use in the PCIe I/O drawer and continues to be supported by the 16 GBps PCIe Gen3 host bus.  The 1000BASE-T Ethernet feature supporting copper connectivity, in addition to 10 Gigabit Ethernet (10 GbE) and Gigabit Ethernet (GbE) for single-mode and multi-mode fibre optic environments.  The OSA-Express6S 1000BASE-T feature will be the last generation to support 100 Mbps link speed connections.  Future OSA-Express 1000BASE-T features will only support 1 Gbps link speed operation.

Of course, OSA-Express technology exposes the IBM Z Mainframe to the same security challenges as any other server node on the IP network, and as well as talking about Pervasive Encryption with this customer, we also talked about the increased security features of the OSA-Express6S adapter:

  • OSA-ICC Support for Secure Sockets Layer: when configured as an integrated console controller CHPID type (OSC) on the z14, supports the configuration and enablement of secure connections using the Transport Layer Security (TLS) protocol versions 1.0, 1.1 and 1.2. Server-side authentication is supported using either a self-signed certificate or a customer supplied certificate, which can be signed by a customer-specified certificate authority.  The certificates used must have an RSA key length of 2048 bits, and must be signed by using SHA-256.  This function support negotiates a cipher suite of AES-128 for the session key.
  • Virtual Local Area Network (VLAN): takes advantage of the Institute of Electrical and Electronics Engineers (IEEE) 802.q standard for virtual bridged LANs. VLANs allow easier administration of logical groups of stations that communicate as though they were on the same LAN.  In the virtualized environment of the IBM Z server, TCP/IP stacks can exist, potentially sharing OSA-Express features.  VLAN provides a greater degree of isolation by allowing contact with a server from only the set of stations that comprise the VLAN.
  • QDIO Data Connection Isolation: provides a mechanism for security regulatory compliance (E.g. HIPPA) for network isolation between the instances that share physical network connectivity, as per installation defined security zone boundaries. A mechanism to isolate a QDIO data connection on an OSA port, by forcing traffic to flow to the external network.  This feature safeguards that all communication flows only between an operating system and the external network.  This feature is provided with a granularity of implementation flexibility for both the z/VM and z/OS operating systems.

As always, the single-footprint capability of an IBM Z server must be considered. From a base architectural OSA design viewpoint, OSA supports 640 TCP/IP stacks or connections per dedicated CHPID, or 640 total stacks across multiple LPARs using a shared or spanned CHPID.  Obviously this allows the IBM Mainframe user to support more Linux images.  Of course, this is a very important consideration when considering the latest z13 and z14 servers for Distributed Systems workload consolidation.

In conclusion, never under estimate the value of the OSA-Express adapter in your organization and its role in transitioning the IBM Mainframe from a closed proprietary environment in the early 1990’s, to just another node on the IP network, from the mid-1990’s to the present day.  As per any other major technology for the IBM Z server, the OSA-Express adapter has evolved to provide the requisite capacity, performance, resilience and security attributes expected for an Enterprise Class workload.  Finally, let’s not lose sight of the technology commonality associated with OSA-Express and Crypto Express adapters; clearly, fundamental building blocks of Pervasive Encryption…

Optimizing Mission Critical Data Value – IBM Machine Learning for z/OS

Typically the IBM Z Mainframe is recognized as the de facto System Of Record (SOR) for storing Mission Critical data.  It therefore follows for generic business applications, DB2, IMS (DB) and even VSAM could be considered as database servers, while CICS and IMS (DC) are transaction servers.  Extracting value from the Mission Critical data source has always been desirable, initially transferring this valuable Mainframe data source to a Distributed Platform via ETL (Extract, Transform, Load) processes.  A whole new software and hardware ecosystem was born for these processes, typically classified as data warehousing.  This process has proved valuable for the last 20 years or so, but more recently the IT industry has evolved, embracing Artificial Intelligence (AI) technologies, ultimately generating Machine Learning capabilities.

For some, it’s important to differentiate between Artificial Intelligence and Machine Learning, so here goes!  Artificial Intelligence is an explicit Computer Science activity, endeavouring to build machines capable of intelligent behaviour.  Machine Learning is a process of evolving computing platforms to act from data patterns, without being explicitly programmed.  In the “what came first world, the chicken or the egg”?  You need AI scientists and engineers to build the smart computing platforms, but you need data scientists or pseudo machine learning experts to make these new computing platforms intelligent.

Conceptually, Machine Learning could be classified as:

  • An automated and seamless learning ability, without being explicitly programmed
  • The ability to grow, change, evolve and adapt when encountering new data
  • An ability to deliver personalized and optimized outcomes from data analysed

When considering this Machine Learning ability with the traditional ETL model, eliminating the need to move data sources from one platform to another, eradicates the “point in time” data timestamp of such a model, and any associated security exposure of the data transfer process.  Therefore, returning to the IBM Z Mainframe being the de facto System Of Record (SOR) for storing Mission Critical data, it’s imperative that the IBM Z Mainframe server delivers its own Machine Learning ability…

IBM Machine Learning for z/OS is an enterprise class machine learning platform solution, assisting the user to create, train and deploy machine learning models, extracting value from your mission critical data on IBM Z platforms, retaining the data in situ, within the IBM Z complex.

Machine Learning for z/OS integrates several IBM machine learning capabilities, including IBM z/OS Platform for Apache Spark.  It simplifies and automates the machine learning workflow, enabling collaboration on machine learning projects across personal and disciplines (E.g. Data Scientists, Business Analysts, Application Developers, et al).  Retaining your Mission Critical data in situ, on your IBM Z platforms, Machine Learning for z/OS significantly reduces the cost, complexity security risk and time for Machine Learning model creation, training and deployment.

Simplistically there are two categories of Machine Learning:

  • Supervised: A model is trained from a known set of data sources, with a target output in mind. In mathematical terms, a formulaic approach.
  • Unsupervised: There is no input or output structure and unsupervised machine learning is required to formulate results from evolving data patterns.

In theory, we have been executing supervised machine learning for some time, but unsupervised is the utopia.

Essentially Machine Learning for z/OS comprises the following functions:

  • Data ingestion (From SOR data sources, DB2, IMS, VSAM)
  • Data preparation
  • Data training and validation
  • Data evaluation
  • Data analysis deployment (predict, score, act)
  • Ongoing learning (monitor, ingestion, feedback)

For these various Machine Learning functions, several technology components are required:

  • z/OS components on z/OS (MLz scoring service, various SPARK ML libraries and CADS/HPO library)
  • Linux/x86 components (Docker images for Repository, Deployment, Training, Ingestion, Authentication and Metadata, services)

The Machine Learning for z/OS solution incorporates the following added features:

  • CADS: Cognitive Assistant for Data Scientist (helps select the best fit algorithm for training)
  • HPO: Hyper Parameter Optimization (provides the Data Scientist with optimal parameters)
  • Brunel Visualization Tool (assist the Data Scientist to understand data distribution)

Machine Learning for z/OS provides a simple framework to manage the entire machine learning workflow.  Key functions are delivered through intuitive web based GUI, a RESTful API and other programming APIs:

  • Ingest data from various sources including DB2, IMS, VSAM or Distributed Systems data sources.
  • Transform and cleanse data for algorithm input.
  • Train a model for the selected algorithm with the prepared data.
  • Evaluate the results of the trained model.
  • Intelligent and automated algorithm/model selection/model parameter optimization based on IBM Watson Cognitive Assistant for Data Science (CADS) and Hyper Parameter Optimization (HPO) technology.
  • Model management.
  • Optimized model development and Production.
  • RESTful API provision allowing Application Development to embed the prediction using the model.
  • Model status, accuracy and resource consumption monitoring.
  • An intuitive GUI wizard allowing users to easily train, evaluate and deploy a model.
  • z Systems authorization and authentication security.

In conclusion, the Machine Learning for z/OS solution delivers the requisite framework for the emerging Data Scientists to collaborate with their Business Analysts and Application Developer colleagues for delivering new business opportunities, with smarter outcomes, while lowering risk and associated costs.

The Ever Changing IBM Z Mainframe Disaster Recovery Requirement

With a 50+ year longevity, of course the IBM Z Mainframe Disaster Recovery (DR) requirement and associated processes have changed and evolved accordingly.  Initially, the primary focus would have been HDA (Head Disk Assembly) related, recovering data due to hardware (E.g. 23nn, 33nn DASD) failures.  It seems incredulous in the 21st Century to consider the downtime and data loss with such an event, but these failures were commonplace into the early 1980’s.  Disk drive (DASD) reliability increased with the 3380 device in the 1980’s and the introduction of the 3990-03 Dual Copy capability in the late 1980’s eradicated the potential consequences of a physical HDA failure.

The significant cost of storage and CPU resources dictated that many organizations had to rely upon 3rd party service providers for DR resource provision.  Often this dictated a classification of business applications, differentiating between Mission Critical or not, where DR backup and recovery processes would be application based.  Even the largest of organizations that could afford to duplicate CPU resource, would have to rely upon the Ford Transit Access Method (FTAM), shipping physical tape from one location to another and performing proactive or more likely reactive data restore activities.  A modicum of database log-shipping over SNA networks automated this process for Mission Critical data, but successful DR provision was still a major consideration.

Even with the Dual Copy function, this meant DASD storage resources had to be doubled for contingency purposes.  Therefore this dictated only the upper echelons of the business world (I.E. Financial Organizations, Telecommunications Suppliers, Airlines, Etc.) could afford the duplication of investment required for self-sufficient DR capability.  Put simply, a duplication of IBM Mainframe CPU, Network and Storage resources was required…

The 1990’s heralded a significant evolution in generic IT technology, including IBM Mainframe.  The adoption of RAID technology for IBM Mainframe Count Key Data (CKD) provided an affordable solution for all IBM Mainframe users, where RAID-5(+) implementations became commonplace.  The emergence of ESCON/FICON channel connectivity provided the extended distance requirement to complement the emerging Parallel SYSPLEX technology, allowing IBM Mainframe servers and related storage to be geographically dispersed.  This allowed a greater number of IBM Mainframe customers to provision their own in-house DR capability, but many still relied upon physical tape shipment to a 3rd party DR services provider.

The final significant storage technology evolution was the Virtual Tape Library (VTL) structure, introduced in the mid-1990’s.  This technology simplified capacity optimization for physical tape media, while reducing the number of physical drives required to satisfy the tape workload.  These VTL structures would also benefit from SYSPLEX implementations, but for many IBM Mainframe users, physical tape shipment might still be required.  Even though the IBM Mainframe had supported IP connectivity since the early 1990’s, using this network capability to ship significant amounts of data was dependent upon public network infrastructures becoming faster and more affordable.  In the mid-2000’s, transporting IBM Mainframe backup data via extended network carriers, beyond the limit of FICON technologies became more commonplace, once again, changing the face of DR approaches.

More recently, the need for Grid configurations of 2, 3 or more locations has become the utopia for the Global 1000 type business organization.  Numerous copies of synchronized Mission Critical if not all IBM Z Mainframe data are now maintained, reducing the Recovery Time Objective (RTO) and Recovery Point Objective (RPO) DR criteria to several Minutes or less.

As with anything in life, learning from the lessons of history is always a good thing and for each and every high profile IBM Z Mainframe user (E.g. 5000+ MSU), there are many more smaller users, who face the same DR challenges.  Just as various technology races (E.g. Space, Motor Sport, Energy, et al) eventually deliver affordable benefit to a wider population, the same applies for the IBM Z Mainframe community.  The commonality is the challenges faced, where over the years, DR focus has either been application or entire business based, influenced by the technologies available to the IBM Mainframe user, typically dictated by cost.  However, the recent digital data explosion generates a common challenge for all IT users alike, whether large or small.  Quite simply, to remain competitive and generate new business opportunities from that priceless and unique resource, namely business data, organizations must embrace the DevOps philosophy.

Let’s consider the frequency of performing DR tests.  If you’re a smaller IBM Z Mainframe user, relying upon a 3rd party DR service provider, your DR test frequency might be 1-2 tests per year.  Conversely if you’re a large IBM z Mainframe user, deploying a Grid configuration, you might consider that your business no longer has the requirement for periodic DR tests?  This would be a dangerous thought pattern, because it was forever thus, SYSPLEX and Grid configurations only safeguard from physical hardware scenarios, whereas a logical error will proliferate throughout all data copies, whether, 2, 3 or more…

Similarly, when considering the frequency of Business Application changes, for the archetypal IBM Z Mainframe user, this might have been Monthly or Quarterly, perhaps with imposed change freezes due to significant seasonal or business peaks.  However, in an IT ecosystem where the IBM Z Mainframe is just another interconnected node on the network, the requirement for a significantly increased frequency of Business Application changes arguably becomes mandatory.  Therefore, once again, if we consider our frequency of DR tests, how many per year do we perform?  In all likelihood, this becomes the wrong question!  A better statement might be, “we perform an automated DR test as part of our Business Application changes”.  In theory, the adoption of DevOps either increases the frequency of scheduled Business Application changes, or organization embraces an “on demand” type approach…

We must then consider which IT Group performs the DR test?  In theory, it’s many groups, dictated by their technical expertise, whether Server, Storage, Network, Database, Transaction or Operations based.  Once again, if embracing DevOps, the Application Development teams need to be able to write and test code, while the Operations teams need to implement and manage the associated business services.  In such a model, there has to be a fundamental mind change, where technical Subject Matter Experts (SME) design and implement technical processes, which simplify the activities associated with DevOps.  From a DR viewpoint, this dictates that the DevOps process should facilitate a robust DR test, for each and every Business Application change.  Whether an organization is the largest or smallest of IBM Z Mainframe user is somewhat arbitrary, performing an entire system-wide DR test for an isolated Business Application change is not required.  Conversely, performing a meaningful Business Application test during the DevOps code test and acceptance process makes perfect sense.

Performing a meaningful Business Application DR test as part of the DevOps process is a consistent requirement, whether an organization is the largest or smallest IBM Z Mainframe user.  Although their hardware resource might differ significantly, where the largest IBM Z Mainframe user would typically deploy a high-end VTL (I.E. IBM TS77n0, EMC DLm 8n00, Oracle VSM, et al), the requirement to perform a seamless, agile and timely Business Application DR test remains the same.

If we recognize that the IBM Z Mainframe is typically deployed as the System Of Record (SOR) data server, today’s 21st century Business Application incorporates interoperability with Distributed Systems (E.g. Wintel, UNIX, Linux, et al) platforms.  In theory, this is a consideration, as mostly, IBM Z Mainframe data resides in proprietary 3390 DASD subsystems, while Distributed Systems data typically resides in IP (NFS, NAS) and/or FC (SAN) filesystems.  However, the IBM Z Mainframe has leveraged from Distributed Systems technology advancements, where typical VTL Grid configurations utilize proprietary IP connected disk arrays for VTL data.  Ultimately a VTL structure will contain the “just in case” copy of Business Application backup data, the very data copy required for a meaningful DR test.  Wouldn’t it be advantageous if the IBM Z Mainframe backup resided on the same IP or FC Disk Array as Distributed Systems backups?

Ultimately the high-end VTL (I.E. IBM TS77n0, EMC DLm 8n00, Oracle VSM, et al) solutions are designed for the upper echelons of the business and IBM Z Mainframe world.  Their capacity, performance and resilience capability is significant, and by definition, so is the associated cost.  How easy or difficult might it be to perform a seamless, agile and timely Business Application DR test via such a high-end VTL?  Are there alternative options that any IBM Z Mainframe user can consider, regardless of their size, whether large or small?

The advances in FICON connectivity, x86/POWER servers and Distributed Systems disk arrays has allowed for such technologies to be packaged in a cost efficient and small footprint IBM Z VTL appliance.  Their ability to connect to the IBM Z server via FICON connectivity, provide full IBM Z tape emulation and connect to ubiquitous IP and FC Distributed Systems disk arrays, positions them for strategic use by any IBM Z Mainframe user for DevOps DR testing.  Primarily one consistent copy of enterprise wide Business Application data would reside on the same disk array, simplifying the process of recovering Point-In-Time backup data for DR testing.

On the one hand, for the smaller IBM Z user, such an IBM Z VTL appliance (E.g. Optica zVT) could for the first time, allow them to simplify their DR processes with a 3rd party DR supplier.  They could electronically vault their IBM Z Mainframe backup data to their 3rd party DR supplier and activate a totally automated DR invocation, as and when required.  On the other hand, moreover for DevOps processes, the provision of an isolated LPAR, would allow the smaller IBM Z Mainframe user to perform a meaningful Business Application DR test, in-house, without impacting Production services.  Once again, simplifying the Business Application DR test process applies to the largest of IBM Z Mainframe users, and leveraging from such an IBM Z VTL appliance, would simplify things, without impacting their Grid configuration supporting their Mission critical workloads.

In conclusion, there has always been commonality in DR processes for the smallest and largest of IBM Z Mainframe users, where the only tangible difference would have been budget related, where the largest IBM Z Mainframe user could and in fact needed to invest in the latest and greatest.  As always, sometimes there are requirements that apply to all, regardless of size and budget.  Seemingly DevOps is such a requirement, and the need to perform on-demand seamless, agile and timely Business Application DR tests is mandatory for all.  From an enterprise wide viewpoint, perhaps a modicum of investment in an affordable IBM Z VTL appliance might be the last time an IBM Z Mainframe user needs to revisit their DR testing processes!

IBM Z Server: Best In Class For Availability – Does Form Factor Matter?

A recent ITIC 2017 Global Server Hardware and Server OS Reliability Survey classified the IBM Z server as delivering the highest levels of reliability/uptime, delivering ~8 Seconds or less of unplanned downtime per month.  This was the 9th consecutive year that such a statistic had been recorded for the IBM Z Mainframe platform.  This compares to ~3 Minutes of unplanned downtime per month for several other specialized server technologies, including IBM POWER, Cisco UCS and HP Integrity Superdome via the Linux Operating System.  Clearly, unplanned server downtime is undesirable and costly, impacting the bottom line of the business.  Industry Analysts state that ~80% of global business require 99.99% uptime, equating to ~52.5 Minutes downtime per year or ~8.66 Seconds per day.  In theory, only the IBM Z Mainframe platform exceeds this availability requirement, while IBM POWER, Cisco UCS and HP Integrity Superdome deliver borderline 99.99% availability capability.  The IBM Mainframe is classified as a mission-critical resource in 92 of the top 100 global banks, 23 of the top 25 USA based retailers, all 10 of the top 10 global insurance companies and 23 of the top 25 largest airlines globally…

The requirement for ever increasing amounts of corporate compute power is without doubt, satisfying the processing of ever increasing amounts of data, created from digital sources, including Cloud, Mobile and Social, requiring near real-time analytics to deliver meaningful information from these oceans of data.  Some organizations select x86 server technology to deliver this computing power requirement, either in their own Data Centre or via a 3rd party Cloud Provider.  However, with unplanned downtime characteristics that don’t meet the seeming de facto 99.99% uptime availability metric, can the growth in x86 server technology continue?  From many perspectives, Reliability, Availability & Serviceability (RAS), Data Security via Pervasive Encryption and best-in-class Performance and Scalability, you might think that the IBM Z Mainframe would be the platform of choice?  For whatever reason, this is not always the case!  Maybe we need to look at recent developments and trends in the compute power delivery market and second guess what might happen in the future…

Significant Cloud providers deliver vast amounts of computing power and associated resources, evolving their business models accordingly.  Such business models have many challenges, primarily uptime and data security related, convincing their prospective customers to migrate their workloads from traditional internal Data Centres, into these massive rack provisioned infrastructures.  Recently Google has evolved from using Intel as its primary supplier for Data Centre CPU chips, including CPU chips from IBM and other semiconductor rivals.

In April 2016, Google declared it had ported its online services to the IBM POWER CPU chip and that its toolchain could output code for Intel x86, IBM POWER and 64-bit ARM cores at the flip of a command-line switch.  As part of the OpenPOWER and Open Compute Project (OCP) initiatives, Google, IBM and Rackspace are collaborating to develop an open server specification based on the IBM POWER9 architecture.  The OCP Rack & Power Project will dictate the size and shape or form factor for housing these industry standard rack infrastructures.  What does this mean for the IBM Z server form factor?

Traditionally and over the last decade or more, IBM has utilized the 24 Inch rack form factor for the IBM Z Mainframe and Enterprise Class POWER Systems.  Of course, this is a different form factor to the industry standard 19 Inch rack, which finally became the de facto standard for the ubiquitous blade server.  Unfortunately there was no tangible standard for a 19 Inch rack, generating power, cooling and other issues.  Hence the evolution of the OCP Rack & Power Standard, codenamed Open Rack.  Google and Facebook have recently collaborated to evolve the Open Rack Standard V2.0, based upon an external 21 Inch rack Form factor, accommodating the de facto 19 Inch rack mounted equipment.

How do these recent developments influence the IBM Z platform?  If you’re the ubiquitous global CIO, knowing your organizations requires 99.99%+ uptime, delivering continuous business application change via DevOps, safeguarding corporate data with intelligent and system wide encryption, perhaps you still view the IBM Z Mainframe as a proprietary server with its own form factor?

As IBM have already demonstrated with their OpenPOWER offering, collaborating with Google and Rackspace, their 24 Inch rack approach can be evolved, becoming just another CPU chip in a Cloud (E.g. IaaS, Paas) service provider environment.  Maybe the final evolution step for the IBM Z Mainframe is evolving its form factor to a ubiquitous 19 Inch rack format?  The intelligent and clearly defined approach of the Open Rack Standard makes sense and if IBM could deliver an IBM Z Server in such a format, it just becomes another CPU chip in the ubiquitous Cloud (E.g. IaaS, Paas) service provider environment.  This might be the final piece of the jigsaw for today’s CIO as their approach to procuring compute power might be based solely upon the uptime and data security metrics.  For those organizations requiring in excess of 99.99% uptime and fully compliant security, there only seems to be one choice, the IBM Z Mainframe CPU chip technology, which has been running Linux workloads since 2000!