Maximizing IBM Z System Of Record (SOR) Data Value: Is ETL Still Relevant?

A generic consensus for the IBM Z Mainframe platform is that it’s the best transaction and database server available, and more recently with the advent of Pervasive Encryption, the best enterprise class security server.  It therefore follows that the majority of mission critical and valuable data resides in IBM Z Mainframe System Of Record (SOR) database repositories, receiving and passing data via real-time transaction services.  Traditionally, maximizing data value generally involved moving data from the IBM Mainframe to another platform, for subsequent analysis, typically for Business Intelligence (BI) and Data Warehouse (DW) purposes.

ETL (Extract, Transform, Load) is an automated and bulk data movement process, transitioning data from source systems via a transformation engine for use by target business decision driven applications, via an installation defined policy, loading the transformed data into target systems, typically data warehouses or specialized data repositories.  Quite simply, ETL enables an organization to make informed and hopefully intelligent data driven business decisions.  This ubiquitous IT industry TLA (Three Letter Acronym) generated a massive industry of ETL solutions, involving specialized software solutions, involving various Distributed Systems hardware platforms, both commodity and specialized.  However, some ~30 years since the first evolution of ETL processes, is ETL still relevant in the 21st Century?

The 21st Century has witnessed a massive and arguably exponential data explosion, from cloud, mobile and social media sources.  These dynamic and open data sources demand intelligent analytics to process the data in near real-time and the notion of having a time delay between the Extract and Load part of the ETL process is becoming increasingly unacceptable for most data driven organizations.  During the last several years, there has been increased usage of Cloud BI, with a reported increase from ~25-80% of public cloud users, deploying Cloud BI solutions.

For cloud resident data warehouses, an evolution from ETL to ELT (Extract, Load, Transform) has taken place.  ELT is an evolutionary and savvy method for of moving data from source systems to centralized data repositories without transforming the data before it’s loaded into the target systems.  The major benefit of the ELT approach is the near real-time processing requirement of today’s data driven 21st Century business.  With ELT, all extracted raw data resides in the data warehouse, where powerful and modern analytical architectures can transform the data, as per the associated business decision making policies.  Put simply, the data transformation occurs when the associated analytical query activities are processed.  For those modern organizations leveraging from public cloud resources, ELT and Cloud BI processes make sense and the growth of Cloud BI speaks for itself.  However, what about the traditional business, which has leveraged from the IBM Z Mainframe platform for 30-50+ years?

Each and every leading Public Cloud supplier, including IBM (Watson) has their own proprietary analytical engine, integrating that technology into their mainstream offerings.  As always, the IBM Z Mainframe platform has evolved to deliver the near real-time requirements of an ELT framework, but are there any other generic solutions that might assist any Mainframe organization in their ETL to ELT evolution process?

B.O.S. Software Service und Vertrieb GmbH offer their tcVISION solution, which approaches this subject matter from a data synchronization viewpoint.  tcVISION is a powerful Change Data Capture (CDC) platform for users of IBM Mainframes and Distributed Systems servers.  tcVISION automatically identifies the changes applied to Mainframe and Distributed Systems databases and files.  No programming effort is necessary to obtain the changed data.  tcVISION continuously propagates the changed data to the target systems in real-time or on a policy driven time interval period, as and when required.  tcVISION offers a rich set of processing and controlling mechanisms to guarantee a data exchange implementation that is fully audit proof.  tcVISION contains powerful bulk processors that perform the initial load of mass data or the cyclic exchange of larger data volumes in an efficient, fast and reliable way.

tcVISION supports several data capture methods that can be individually used as the application and associated data processing flow requires.  These methods are based upon a Real-Time or near Real-Time basis, including IBM Mainframe DBMS, Logstream, Log and Snapshot (compare) data sources.  A myriad of generic database repositories are supported:

  • Adabas: Realtime/Near Realtime, Log Processing, Compare Processing
  • Adabas LUW: Real-time/Near Real-time, log processing, compare processing
  • CA-Datacom: Log processing, compare processing
  • CA-IDMS: Real-time/Near real-time, log processing, compare processing
  • DB2: Real-time/Near real-time, log processing, compare processing
  • DB2/LUW: Real-time/Near real-time, log processing, compare processing
  • Exasol: Compare processing
  • IMS: Real-time/Near real-time, log processing, compare processing
  • Informix: Real-time/Near real-time, log processing, compare processing
  • Microsoft SQL Server: Real-time/Near real-time, log processing, compare processing
  • Oracle: Real-time/Near real-time, log processing, compare processing
  • PostgreSQL: Real-time/Near real-time, log processing, compare processing
  • Sequential file: Compare processing
  • Teradata: Compare processing
  • VSAM: Real-time/Near real-time, log processing, compare processing
  • VSAM/CICS: Real-time/Near real-time, log processing, compare processing

tcVISION incorporates an intelligent bulk load component that can be used to unload data from a Mainframe or Distributed Systems data source, loading the data into a target database, either directly or by using a loader file.  tcVISION comes with an integrated loop-back prevention for bidirectional data exchange, where individual criteria can be specified to detect and ignore changes that have already been applied.  tcVISION incorporates comprehensive monitoring, logging and integrated alert notification.  Optional performance data may be captured and stored into any commercially available relational database.  This performance data can be analyzed and graphically displayed using the tcVISION web component.

From an ETL to ELT evolution viewpoint, tcVISION delivers the following data synchronization benefits:

  • Time Optimization: Significant reduction in data exchange implementation processes and data synchronization processing.
  • Heterogenous Support: Independent of database supplier, offering support for a myriad of source and target databases.
  • Resource Optimization: Mainframe MIPS reduction and data transfer optimization via intelligent secure compression algorithms.
  • Data Availability: Real-time data replication across application and system boundaries.
  • Implementation Simplicity: Eradication of application programming and data engineer resources.
  • Security: Full accountability and auditability all data movements.

In conclusion, the ETL process has now been superseded by the real-time data exchange requirement for 21st Century data processing via the ELT evolution.  Whether viewed as an ELT or data synchronization requirement, tcVISION delivers an independent vendor agnostic solution, which can efficiently deliver seamless data delivery for analytical purposes, while maintaining synchronized data copies between environments in real-time.

Optimizing Mission Critical Data Value – IBM Machine Learning for z/OS

Typically the IBM Z Mainframe is recognized as the de facto System Of Record (SOR) for storing Mission Critical data.  It therefore follows for generic business applications, DB2, IMS (DB) and even VSAM could be considered as database servers, while CICS and IMS (DC) are transaction servers.  Extracting value from the Mission Critical data source has always been desirable, initially transferring this valuable Mainframe data source to a Distributed Platform via ETL (Extract, Transform, Load) processes.  A whole new software and hardware ecosystem was born for these processes, typically classified as data warehousing.  This process has proved valuable for the last 20 years or so, but more recently the IT industry has evolved, embracing Artificial Intelligence (AI) technologies, ultimately generating Machine Learning capabilities.

For some, it’s important to differentiate between Artificial Intelligence and Machine Learning, so here goes!  Artificial Intelligence is an explicit Computer Science activity, endeavouring to build machines capable of intelligent behaviour.  Machine Learning is a process of evolving computing platforms to act from data patterns, without being explicitly programmed.  In the “what came first world, the chicken or the egg”?  You need AI scientists and engineers to build the smart computing platforms, but you need data scientists or pseudo machine learning experts to make these new computing platforms intelligent.

Conceptually, Machine Learning could be classified as:

  • An automated and seamless learning ability, without being explicitly programmed
  • The ability to grow, change, evolve and adapt when encountering new data
  • An ability to deliver personalized and optimized outcomes from data analysed

When considering this Machine Learning ability with the traditional ETL model, eliminating the need to move data sources from one platform to another, eradicates the “point in time” data timestamp of such a model, and any associated security exposure of the data transfer process.  Therefore, returning to the IBM Z Mainframe being the de facto System Of Record (SOR) for storing Mission Critical data, it’s imperative that the IBM Z Mainframe server delivers its own Machine Learning ability…

IBM Machine Learning for z/OS is an enterprise class machine learning platform solution, assisting the user to create, train and deploy machine learning models, extracting value from your mission critical data on IBM Z platforms, retaining the data in situ, within the IBM Z complex.

Machine Learning for z/OS integrates several IBM machine learning capabilities, including IBM z/OS Platform for Apache Spark.  It simplifies and automates the machine learning workflow, enabling collaboration on machine learning projects across personal and disciplines (E.g. Data Scientists, Business Analysts, Application Developers, et al).  Retaining your Mission Critical data in situ, on your IBM Z platforms, Machine Learning for z/OS significantly reduces the cost, complexity security risk and time for Machine Learning model creation, training and deployment.

Simplistically there are two categories of Machine Learning:

  • Supervised: A model is trained from a known set of data sources, with a target output in mind. In mathematical terms, a formulaic approach.
  • Unsupervised: There is no input or output structure and unsupervised machine learning is required to formulate results from evolving data patterns.

In theory, we have been executing supervised machine learning for some time, but unsupervised is the utopia.

Essentially Machine Learning for z/OS comprises the following functions:

  • Data ingestion (From SOR data sources, DB2, IMS, VSAM)
  • Data preparation
  • Data training and validation
  • Data evaluation
  • Data analysis deployment (predict, score, act)
  • Ongoing learning (monitor, ingestion, feedback)

For these various Machine Learning functions, several technology components are required:

  • z/OS components on z/OS (MLz scoring service, various SPARK ML libraries and CADS/HPO library)
  • Linux/x86 components (Docker images for Repository, Deployment, Training, Ingestion, Authentication and Metadata, services)

The Machine Learning for z/OS solution incorporates the following added features:

  • CADS: Cognitive Assistant for Data Scientist (helps select the best fit algorithm for training)
  • HPO: Hyper Parameter Optimization (provides the Data Scientist with optimal parameters)
  • Brunel Visualization Tool (assist the Data Scientist to understand data distribution)

Machine Learning for z/OS provides a simple framework to manage the entire machine learning workflow.  Key functions are delivered through intuitive web based GUI, a RESTful API and other programming APIs:

  • Ingest data from various sources including DB2, IMS, VSAM or Distributed Systems data sources.
  • Transform and cleanse data for algorithm input.
  • Train a model for the selected algorithm with the prepared data.
  • Evaluate the results of the trained model.
  • Intelligent and automated algorithm/model selection/model parameter optimization based on IBM Watson Cognitive Assistant for Data Science (CADS) and Hyper Parameter Optimization (HPO) technology.
  • Model management.
  • Optimized model development and Production.
  • RESTful API provision allowing Application Development to embed the prediction using the model.
  • Model status, accuracy and resource consumption monitoring.
  • An intuitive GUI wizard allowing users to easily train, evaluate and deploy a model.
  • z Systems authorization and authentication security.

In conclusion, the Machine Learning for z/OS solution delivers the requisite framework for the emerging Data Scientists to collaborate with their Business Analysts and Application Developer colleagues for delivering new business opportunities, with smarter outcomes, while lowering risk and associated costs.

The Ever Changing IBM Z Mainframe Disaster Recovery Requirement

With a 50+ year longevity, of course the IBM Z Mainframe Disaster Recovery (DR) requirement and associated processes have changed and evolved accordingly.  Initially, the primary focus would have been HDA (Head Disk Assembly) related, recovering data due to hardware (E.g. 23nn, 33nn DASD) failures.  It seems incredulous in the 21st Century to consider the downtime and data loss with such an event, but these failures were commonplace into the early 1980’s.  Disk drive (DASD) reliability increased with the 3380 device in the 1980’s and the introduction of the 3990-03 Dual Copy capability in the late 1980’s eradicated the potential consequences of a physical HDA failure.

The significant cost of storage and CPU resources dictated that many organizations had to rely upon 3rd party service providers for DR resource provision.  Often this dictated a classification of business applications, differentiating between Mission Critical or not, where DR backup and recovery processes would be application based.  Even the largest of organizations that could afford to duplicate CPU resource, would have to rely upon the Ford Transit Access Method (FTAM), shipping physical tape from one location to another and performing proactive or more likely reactive data restore activities.  A modicum of database log-shipping over SNA networks automated this process for Mission Critical data, but successful DR provision was still a major consideration.

Even with the Dual Copy function, this meant DASD storage resources had to be doubled for contingency purposes.  Therefore this dictated only the upper echelons of the business world (I.E. Financial Organizations, Telecommunications Suppliers, Airlines, Etc.) could afford the duplication of investment required for self-sufficient DR capability.  Put simply, a duplication of IBM Mainframe CPU, Network and Storage resources was required…

The 1990’s heralded a significant evolution in generic IT technology, including IBM Mainframe.  The adoption of RAID technology for IBM Mainframe Count Key Data (CKD) provided an affordable solution for all IBM Mainframe users, where RAID-5(+) implementations became commonplace.  The emergence of ESCON/FICON channel connectivity provided the extended distance requirement to complement the emerging Parallel SYSPLEX technology, allowing IBM Mainframe servers and related storage to be geographically dispersed.  This allowed a greater number of IBM Mainframe customers to provision their own in-house DR capability, but many still relied upon physical tape shipment to a 3rd party DR services provider.

The final significant storage technology evolution was the Virtual Tape Library (VTL) structure, introduced in the mid-1990’s.  This technology simplified capacity optimization for physical tape media, while reducing the number of physical drives required to satisfy the tape workload.  These VTL structures would also benefit from SYSPLEX implementations, but for many IBM Mainframe users, physical tape shipment might still be required.  Even though the IBM Mainframe had supported IP connectivity since the early 1990’s, using this network capability to ship significant amounts of data was dependent upon public network infrastructures becoming faster and more affordable.  In the mid-2000’s, transporting IBM Mainframe backup data via extended network carriers, beyond the limit of FICON technologies became more commonplace, once again, changing the face of DR approaches.

More recently, the need for Grid configurations of 2, 3 or more locations has become the utopia for the Global 1000 type business organization.  Numerous copies of synchronized Mission Critical if not all IBM Z Mainframe data are now maintained, reducing the Recovery Time Objective (RTO) and Recovery Point Objective (RPO) DR criteria to several Minutes or less.

As with anything in life, learning from the lessons of history is always a good thing and for each and every high profile IBM Z Mainframe user (E.g. 5000+ MSU), there are many more smaller users, who face the same DR challenges.  Just as various technology races (E.g. Space, Motor Sport, Energy, et al) eventually deliver affordable benefit to a wider population, the same applies for the IBM Z Mainframe community.  The commonality is the challenges faced, where over the years, DR focus has either been application or entire business based, influenced by the technologies available to the IBM Mainframe user, typically dictated by cost.  However, the recent digital data explosion generates a common challenge for all IT users alike, whether large or small.  Quite simply, to remain competitive and generate new business opportunities from that priceless and unique resource, namely business data, organizations must embrace the DevOps philosophy.

Let’s consider the frequency of performing DR tests.  If you’re a smaller IBM Z Mainframe user, relying upon a 3rd party DR service provider, your DR test frequency might be 1-2 tests per year.  Conversely if you’re a large IBM z Mainframe user, deploying a Grid configuration, you might consider that your business no longer has the requirement for periodic DR tests?  This would be a dangerous thought pattern, because it was forever thus, SYSPLEX and Grid configurations only safeguard from physical hardware scenarios, whereas a logical error will proliferate throughout all data copies, whether, 2, 3 or more…

Similarly, when considering the frequency of Business Application changes, for the archetypal IBM Z Mainframe user, this might have been Monthly or Quarterly, perhaps with imposed change freezes due to significant seasonal or business peaks.  However, in an IT ecosystem where the IBM Z Mainframe is just another interconnected node on the network, the requirement for a significantly increased frequency of Business Application changes arguably becomes mandatory.  Therefore, once again, if we consider our frequency of DR tests, how many per year do we perform?  In all likelihood, this becomes the wrong question!  A better statement might be, “we perform an automated DR test as part of our Business Application changes”.  In theory, the adoption of DevOps either increases the frequency of scheduled Business Application changes, or organization embraces an “on demand” type approach…

We must then consider which IT Group performs the DR test?  In theory, it’s many groups, dictated by their technical expertise, whether Server, Storage, Network, Database, Transaction or Operations based.  Once again, if embracing DevOps, the Application Development teams need to be able to write and test code, while the Operations teams need to implement and manage the associated business services.  In such a model, there has to be a fundamental mind change, where technical Subject Matter Experts (SME) design and implement technical processes, which simplify the activities associated with DevOps.  From a DR viewpoint, this dictates that the DevOps process should facilitate a robust DR test, for each and every Business Application change.  Whether an organization is the largest or smallest of IBM Z Mainframe user is somewhat arbitrary, performing an entire system-wide DR test for an isolated Business Application change is not required.  Conversely, performing a meaningful Business Application test during the DevOps code test and acceptance process makes perfect sense.

Performing a meaningful Business Application DR test as part of the DevOps process is a consistent requirement, whether an organization is the largest or smallest IBM Z Mainframe user.  Although their hardware resource might differ significantly, where the largest IBM Z Mainframe user would typically deploy a high-end VTL (I.E. IBM TS77n0, EMC DLm 8n00, Oracle VSM, et al), the requirement to perform a seamless, agile and timely Business Application DR test remains the same.

If we recognize that the IBM Z Mainframe is typically deployed as the System Of Record (SOR) data server, today’s 21st century Business Application incorporates interoperability with Distributed Systems (E.g. Wintel, UNIX, Linux, et al) platforms.  In theory, this is a consideration, as mostly, IBM Z Mainframe data resides in proprietary 3390 DASD subsystems, while Distributed Systems data typically resides in IP (NFS, NAS) and/or FC (SAN) filesystems.  However, the IBM Z Mainframe has leveraged from Distributed Systems technology advancements, where typical VTL Grid configurations utilize proprietary IP connected disk arrays for VTL data.  Ultimately a VTL structure will contain the “just in case” copy of Business Application backup data, the very data copy required for a meaningful DR test.  Wouldn’t it be advantageous if the IBM Z Mainframe backup resided on the same IP or FC Disk Array as Distributed Systems backups?

Ultimately the high-end VTL (I.E. IBM TS77n0, EMC DLm 8n00, Oracle VSM, et al) solutions are designed for the upper echelons of the business and IBM Z Mainframe world.  Their capacity, performance and resilience capability is significant, and by definition, so is the associated cost.  How easy or difficult might it be to perform a seamless, agile and timely Business Application DR test via such a high-end VTL?  Are there alternative options that any IBM Z Mainframe user can consider, regardless of their size, whether large or small?

The advances in FICON connectivity, x86/POWER servers and Distributed Systems disk arrays has allowed for such technologies to be packaged in a cost efficient and small footprint IBM Z VTL appliance.  Their ability to connect to the IBM Z server via FICON connectivity, provide full IBM Z tape emulation and connect to ubiquitous IP and FC Distributed Systems disk arrays, positions them for strategic use by any IBM Z Mainframe user for DevOps DR testing.  Primarily one consistent copy of enterprise wide Business Application data would reside on the same disk array, simplifying the process of recovering Point-In-Time backup data for DR testing.

On the one hand, for the smaller IBM Z user, such an IBM Z VTL appliance (E.g. Optica zVT) could for the first time, allow them to simplify their DR processes with a 3rd party DR supplier.  They could electronically vault their IBM Z Mainframe backup data to their 3rd party DR supplier and activate a totally automated DR invocation, as and when required.  On the other hand, moreover for DevOps processes, the provision of an isolated LPAR, would allow the smaller IBM Z Mainframe user to perform a meaningful Business Application DR test, in-house, without impacting Production services.  Once again, simplifying the Business Application DR test process applies to the largest of IBM Z Mainframe users, and leveraging from such an IBM Z VTL appliance, would simplify things, without impacting their Grid configuration supporting their Mission critical workloads.

In conclusion, there has always been commonality in DR processes for the smallest and largest of IBM Z Mainframe users, where the only tangible difference would have been budget related, where the largest IBM Z Mainframe user could and in fact needed to invest in the latest and greatest.  As always, sometimes there are requirements that apply to all, regardless of size and budget.  Seemingly DevOps is such a requirement, and the need to perform on-demand seamless, agile and timely Business Application DR tests is mandatory for all.  From an enterprise wide viewpoint, perhaps a modicum of investment in an affordable IBM Z VTL appliance might be the last time an IBM Z Mainframe user needs to revisit their DR testing processes!

zAPI: System z Deployment Into The API Economy

Having been in the IT industry for 35+ years, I have always fully embraced and learned new technologies, to find strategic solutions for business challenges.  Obviously, starting in 1980, my heritage is IBM Mainframe, supplemented by UNIX, Wintel and Linux along the way.  Each and every platform has its merits, and during this 35+ year period, I have attended many conferences, for all platforms.  What I have noticed during this period is the attendance of many IBM Mainframe CIO, CTO or Chief Architect individuals at non-IBM Mainframe conferences, but very few, if any, equivalent Distributed Systems personnel at IBM Mainframe conferences.

I’m always surprised and disappointed to hear about organizations talking about decommissioning the IBM Mainframe platform, with tenuous reasons, based on Distributed Systems FUD messaging, as opposed to their own business requirements.  Thankfully these scenarios are decreasing over the years.  Presumably if an organization decides to migrate from one Distributed Systems platform to another or perhaps the Cloud, they do at least attend the relevant platform conferences to make an informed decision.

Over the last 25 years or so, IBM themselves compete with differing divisions and options, whether UNIX (AIX), System z and in recent years, Linux on z Systems, most notably with the LinuxONE launch at LinuxCon 2015.  One would hope that the world’s key IT decision makers might attend LinuxCon with an open mind and learn more about the System z Mainframe?

A ridiculous notion might be that one server platform technology can satisfy a 21st Century organizations IT infrastructure for their mission critical services.  Clearly that has not been the case since the advent of Client Server and today’s emerging Digital business requires an infrastructure of multiple layers, where the underlying server technology is somewhat arbitrary, and arguably a commodity resource.  Conversely the underlying data and associated applications differentiate one business from another, delivering business value and competitive edge.

Let’s take some time to consider this IT architecture design, which very quickly dismisses any notion that one server technology delivers all business requirements:

Such an architecture diagram does not impose any technology decisions.  Conversely it explores the “data journey” from access or creation, via Systems of Engagement (SoE) to eventual storage within Systems of Record (SOR) data repositories (I.E. Database).  Some might say it was forever thus, with the exception of the Multi-Channel SDK’s & API’s layer, where the savvy organizations will embrace DevOps, Hybrid Cloud and connectivity (I.E. API, SDK) solutions, seamlessly integrating modern agile applications, with that most valuable business asset, Systems of Record (SoR) data.

Today’s Application Developer doesn’t need to concern themselves as to the platform used for their DevOps application processes, the Transaction Server or indeed the Database Server.  Sure, several decades ago, maybe even a decade ago, application code was deeply associated if not confined to a specific CPU server architecture.  Clearly that is no longer the case.  Any organization that still thinks in this legacy manner, is behind the times, and this is unfortunate.  Associating such outdated thinking with the System z Mainframe is arguably careless, and not a reason for dismissing an incumbent System z platform, or not considering a System z platform in the future.

Arguably the greatest strengths of today’s System z IBM Mainframe, currently packaged as the z13 or LinuxONE, are as a Database Server (E.g. DB2), Transaction Server (E.g. CICS, WebSphere Application Server) and Security Server (E.g. ACF2, RACF, Top Secret).  From a LinuxONE viewpoint, it’s just another server, capable of processing all of the latest strategic Open Source and Commercial Off The Shelf (COTS) Cloud, Database and Application solutions, while benefitting from the unparalleled System z Quality of Service (QoS) attributes.

However, for those organizations already deploying a System z Mainframe, its greatest perceived issue is TCO.  Without doubt the convoluted and intricate Workload Licence Charges (WLC) are unnecessarily complicated and perceived as being very expensive.  Optimizing these costs requires a modicum of expertise, safeguarding that the best contractual conditions are negotiated.  However, I encounter the same complexities with Distributed Systems platforms, where software license costs can spiral out of control for significant CPU capacity deployments.  Whatever platform is deployed, System z Mainframe or Distributed System, unless the business has the requisite skills in place, technical and commercial, to safeguard the lowest cost possible, commercial ISV suppliers will take advantage of such an oversight.

I’m not advocating any server technology, System z Mainframe, Distributed System or Cloud, as each resource has its merits, depending on the business requirement.  However, today’s 21st Century organization must enable new business channels by leveraging from and arguably enable new business channels by monetizing their Systems of Record (SoR) enterprise data.

Today, organizations need to consider an API Economy, where they expose their internal digital business assets or services in the form of Web API services to external 3rd party partners and consumers, with an overall objective of unlocking increased business value via the creation of new assets.  Such an API Economy will require integration of Transaction and Data resources, specifically:

  • Centrally manage the consumption of enterprise wide business logic, for both Systems of Record (SoR) & Systems of Engagement (SoE)
  • Extend business (E.g. Product, Brand) reach from Systems of Record (SoR), incorporation Systems of Engagement (SoE)

Previously I wrote about How to Connect Mobile Workloads to System z, detailing the conceptual steps required to expose existing SoR data assets with SoE transaction services, via z/OS Connect.  For a fully integrated end-to-end integrated solution, we must also consider the Application Programming Interfaces (API), being the digital glue that seamlessly links applications, services and systems together.

IBM API Connect is a solution that manages the API lifecycle for both On-Premises and Cloud environments.  IBM API Connect delivers capabilities to Create, Run, Manage & Secure API resources and Microservices.  It also enables you to rapidly deploy and simplify API administration, across the organization.

API Connect can be deployed On-Premises via Linux on z Systems, in the cloud (E.g. Bluemix), as well as all other popular Distributed Systems.  Once again, the main message is that the chosen server is arbitrary, System z Mainframe, Distributed System or Cloud.  The server should be considered as a commodity resource, leveraging from existing business logic (I.E. SoE) and data (I.E. SoR), while evolving existing Application Lifecycle Management (E.g. Agile, API Economy, DevOps) is the key.

My final observation is the Mainframe Baby Boomer (E.g. Born ~1960) versus the Millennial (E.g. Born ~1995) technical personnel resource.  Without doubt, there are significant differences in their approach to application programming, but only one resource, namely the Baby Boomer knows the business really well.  I think these folks have the ability to learn another 21st Century programming language, as well as COBOL, but perhaps their best attribute is an analytical role, especially for the integration of SoE and SoR layers.  Working very closely with Millennial technical resources, delivering the new Application (I.E. App, API) resources, the Mainframe Baby Boomer still has something valuable to offer in their final employment years.  For the avoidance of doubt, still delivering value from an analytical viewpoint, while transferring their skills and knowledge to their successors, namely the Millennial.

In conclusion, dismissing any server technology for Fear, Uncertainty or Doubt (FUD) reasons, is an unproductive and ridiculous notion.  More importantly, what might your business lose in opportunity, spending several years or more, migrating from one platform to another, while your competitors are embracing the Digital Age with an API Economy approach, delivering more value from their existing business SoE (transactions) and SoR (data) assets?

Are You Ready For z Systems Workload Pricing for Cloud (zWPC) for z/OS?

Recently IBM announced the z Systems Workload Pricing for Cloud (zWPC) for z/OS pricing mechanism, which can minimize the impact of new Public Cloud workload transactions on Sub-Capacity license charges.  Such benefits will be delivered where higher Public Cloud workload transaction volumes may cause a spike in machine utilization.  Of course, if this looks familiar and you have that feeling of déjà vu, this is a very similar mechanism to Mobile Workload Pricing (MWP)…

Put simply, zWPC applies to any organization that has implemented Sub-Capacity pricing via the basic AWLC or AEWLC pricing mechanisms, for the usual MLC software suspects, namely z/OS, CICS, DB2, IMS, MQ and WebSphere Application Server (WAS).  An eligible transaction is one classified as Public Cloud originated, connecting to a z/OS hosted transactional service and/or data source via a REST or SOAP web service.  Public Cloud workloads are defined as transactions processed by named Public Cloud applications transactions identified as originating from a recognized Public Cloud offering, including but not limited to, Amazon Web Services (AWS), Microsoft Azure, IBM Bluemix, et al.

As per MWP, SCRT calculates the R4HA for Public Cloud transaction GP MSU resource usage, subtracting 60% of those values from the traditional Sub-Capacity software eligible MSU metric, with LPAR granularity, for each and every reporting hour.  The software program values for the same hour are aggregated for all Sub-Capacity eligible LPARs, deriving an adjusted Sub-Capacity value for each reporting hour.  Therefore SCRT determines the billable MSU peak for a given MLC software program on a CPC using the adjusted MSU values.  As per MWP, this will only be of benefit, if the Public Cloud originated transactions generate a spike in the current R4HA.

One of the major challenges for implementing MWP was identifying those transactions eligible for consideration.  Very quickly IBM identified this challenge and offered a WorkLoad Manager (WLM) based solution, to simplify reporting for all concerned.  This WLM SPE (OA47042), introduced a new transaction level attribute in WLM classification, allowing for identification of mobile transactions and associated processor consumption.  These Reporting Attributes were classified as NONE, MOBILE, CATEGORYA and CATEGORYB.  Obviously IBM made allowances for future workload classifications, hence it would seem Public Cloud will supplement Mobile transactions.

In a previous z/OS Workload Manager (WLM): Balancing Cost & Performance blog post, we considered the merits of WLM for optimizing z/OS software costs, while maintaining optimal performance.  One must draw one’s own conclusions, but there seemed to be a strong case for WLM reporting to be included in the z/OS MLC Cost Manager toolkit.  The introduction of zWPC, being analogous to MWP, where reporting can be simplified with supplied and supported WLM function, indicates that intelligent and proactive WLM reporting makes sense.  Certainly for 3rd party Soft-Capping solutions, the ability to identify MWP and zWPC eligible transactions in real-time, proactively implementing MSU optimization activities seems mandatory.

The Workload X-Ray (WLXR) solution from zIT Consulting delivers this WLM reporting function, seamlessly integrating with their zDynaCap and zPrice Manager MSU optimization solutions.  Of course, there is always the possibility to create your own bespoke reports to extract the relevant information from SMF records and subsystem diagnostic data, for input to the SCRT process.  However, such a home-grown process will only work on a monthly reporting basis and not integrate with any Soft-Capping MSU management, which will ultimately control z/OS MLC costs.

In conclusion, from a big picture viewpoint, in the last 2 years or so, IBM have introduced several new Sub-Capacity pricing mechanisms to help System z Mainframe users optimize z/OS MLC costs, namely Mobile Workload Pricing (MWP), Country Multiplex Pricing (CMP) and now z Systems Workload Pricing for Cloud (zWPC).  In theory, at least one of these new pricing mechanisms should deliver benefit to the committed System z user, deploying this server for strategic and Mission Critical workloads.  With the undoubted strategic importance associated with Analytics, Blockchain, Cloud, DevOps, Mobile, Social, et al, the landscape for System z workloads is rapidly evolving and potentially impacting those sacrosanct legacy Mission Critical workloads.  Seemingly the realm of possibility exists that Cloud and Mobile originated transactions will dominate access to System z Mainframe System Of Record (SOR) data repositories, which generates a requirement to optimize associated MLC costs accordingly.  Of course, for some System z users, such Cloud and Mobile access might not be on today’s to-do list, but inevitably it’s on the horizon, and so why not implement the instrumentation ability ASAP!

Blockchain: A New Application Development Paradigm – What About System z?

Since the inception of Data Processing and the advent of the IBM Mainframe there has been a progressive movement to deliver the de facto “System Of Record (SOR)”, typically classified as a centralised database and related applications.  The key or common denominator for this “Golden Record” is somewhat arbitrary, but more often than not, for most businesses, it will be customer or product identity related.  The benefit of identifying and establishing an SOR is the reuse of this data, for a multitude of different business usage scenarios.

From an application programming viewpoint, historically there was a structured approach when delivering new business function, whether with bespoke programs or Commercial Off the Shelf (COTS) software packages.  More recently data analytics has accelerated this approach, where new business opportunities can be identified from data trends, with near real-time processing, while DevOps frameworks allow for rapid application delivery and implementation.  However, what if there was a new approach with a different type of database and as a consequence, a new approach to application programming?

From a simplistic viewpoint, Blockchain architecture is analogous to traditional database processing, whereas the interaction with said Blockchain database is vastly different, changing from a centralised to decentralised focus.  Therefore for application developers, Blockchain is a paradigm shifting architecture, in how software applications will be architected and coded.  Recognition of this new and rapidly emerging computing paradigm is of vital importance, because it’s the cornerstone for the creation of decentralised applications, a logical and natural evolution from distributed computing architectural constructs.

If we take some time to step back from the Information Technology world and consider the possibilities when comparing a centralised versus decentralised approach, the realm of possibility exists for a truly global interconnectivity approach, which isn’t limited to a specific discrete focus (E.g. Governance, Market, Business Sector, et al).  In theory, decentralised applications might deliver a dynamic and highly collaborative business approach…

A Blockchain is a pseudo linear container space (block) to store data for “controlled public usage”.  In theory, with the right credentials, this data can be accessed by any user!  The Blockchain container is secured with the originators key, so only the key holder or authorised program can unlock the container data.  This is the fundamental difference between a database and a Blockchain.  For a Blockchain, the header record can be considered “eligible for Public usage”.

The data stored within a Blockchain might be considered as a “token”, the most obvious implementation being Bitcoin.  Generically, Blockchain might be considered as an alternative and flexible data transfer system that no private or public authority and especially a malicious third party can tamper with, because of the encryption process.  Put really simply, the data header has “Public” visibility, but data access requires “Private” authenticated access.

From a high-level viewpoint, Blockchain can be considered as an architectural approach, connecting an infinite a number of peer computers, collaborating with a generic process for releasing or recording data, based upon cryptographic transactions.

One must draw one’s own conclusions as to whether this Centralised to Distributed to Decentralised data and application programming approach is the way forward for their business.

Decentralised Consensus is the inverse of a centralised approach where one central database was accessed to validate transaction processing.  A decentralised scheme transfers authority and trust to a decentralised virtual network, enabling processing nodes to continuously access or record transactions within a public block, creating a unique chain for modification operations, hence the Blockchain terminology.  Each successive data block contains a unique fingerprint (hash) of the previous code.  The basic premise of cryptographic processing applies, where hash codes are used to secure transaction origination authentication, eliminating the requirement for centralised processing. Duplicate transaction processing is eliminated because of Blockchain and associated cryptographic processing.

This separation of consensus (data access) from the actual application itself is the fundamental building block for a decentralised application programming approach.

Smart Contracts are the building blocks for decentralised applications.  A smart contract is a small self-contained program that you entrust with a value unit (token) and associated rules.  The simple philosophy of a smart contract is to programmatically facilitate transactional contractual governance between two or more parties via the Blockchain.  This eliminates the requirement of an arbitrary 3rd party authority for governance, when two or more parties can agree exchange between themselves.  Even today, this type of approach is not unusual between organizations, typically based upon a data (file) interchange standard (E.g. Banking).

Put simply, smart contracts eliminate the requirements of 3rd party intermediaries for transaction processing.  Ideally, the collaborating parties define and agree the required policy, embedded inside the business transaction, enabling a self-managed process between nodes (computers) that represent the reciprocal interests of the associated users and owners.

Trusted Computing combines the architectural foundations of Blockchain, decentralised consensus and smart contracts, enabling the spread of resources and transactions with a trusted “peer-to-peer” relationship, in theory enabling trust between numerous nodes (computers).

Previously institutions and central organizations were necessary as trusted authorities.  Deploying a Blockchain approach, these historical centralised central functions can be simplified via smart contracts, governed by decentralised consensus within a Blockchain.

Proof of Work is an important concept to identify the unequivocal authenticator of transactions, allowing the authorised access to participate in the Blockchain system.  Proof of work is a fundamental building block because once created, it cannot be modified, being secured by cryptographic hashes that ensure its authenticity.  Usability challenges ensue, preventing users from changing Blockchain records, without reprocessing the “proof of work”.

It therefore follows, proof of work will be expensive to maintain, with likely future scalability and security issues, depending on the data user (miner) requirements and incentives, which in all likelihood, will reduce over time.  As we all know, most data access is high when data has been recently created, rapidly decreasing to low or even null after a limited period of time.

Proof of Stake is a more elegant and alternative approach, determining which user can update the consensus, while preventing unwanted forking of the underlying Blockchain, being a more cost efficient approach, while being more difficult and expensive to compromise.

Once again, if we consider the benefits of Blockchain from a business processing viewpoint, there is a clear and present opportunity to eliminate manual or semi-automated processes, both internal and external to the business.  This could expedite the completion of processes that previously required days or even weeks to complete and the potential for human error.  A simple example might be a car purchase, based upon 3rd party finance.  Such a process typically includes 3rd party data requirements, for vehicle provenance, credit scoring, identity proof, et al.  If the business world looks at the big picture, they can simplify and automate their processes, by collaborating with existing and more likely, yet to be identified partners.  The benefits are patently obvious…

From a System z viewpoint, recent technological developments leverage from existing IBM resources, including the LinuxONE, Bluemix and Watson offerings:

  • LinuxONE: The System z and LinuxONE platforms are best placed to drive Blockchain innovation, arguably via the Open Mainframe and Hyperedger IBM supports testing and development of the open Blockchain fabric code for developers on their LinuxONE Community Cloud.
  • Bluemix: the IBM Blockchain services available on Bluemix, developers can access fully integrated DevOps tools for creating, deploying, running and monitoring Blockchain applications on the IBM Cloud.
  • Watson: Leveraging from the Watson IoT Platform, IBM will enable information from devices such as RFID-based locations, barcode-scan events or device-reported data, to be used within the IBM Blockchain. Devices will be able to communicate to Blockchain based ledgers to update or validate smart contracts.

From a business benefits viewpoint, the IBM System z platform is ideally placed for Blockchain deployment, being a highly secure EAL5+ certified platform.  Hardware accelerators deliver high speed secure encryption and hashing, supplemented by tamper-proof security Crypto Express modules for key management.  Numerous memory resident partitions can also be created rapidly to keep ledgers separate and secure.  As per usual, the System z platform has the fastest commercial processor, a highly scalable I/O system to handle massive numbers of transactions, ample memory for Blockchain operations and an optimised secure network for optimised Blockchain peer communications.

Returning full circle to where this article started, the System z Mainframe is arguably the de facto System Of Record platform for the worlds traditional Fortune 500 or Global 2000 businesses.  These well established businesses have in all likelihood spent several decades or more establishing this centralised application programming and database usage model.  The realm of opportunity exists to make this priceless data asset available to numerous businesses, both large and small via Blockchain architectures.  If we consider just one simple example, a highly globalised and significant Banking institution could facilitate the creation of a new specialised and optimised “challenger banking” operation, for a particular location or business sector, leveraging from their own internal System Of Record data and perhaps, vital data from another source.  One could have the hypothetical debate as to whether a well-established bank is best placed for such a new offering, but with intelligent collaboration, delivering a valuable service to a new market, where such a service has not been previously possible, doesn’t everybody win?

Perhaps with Blockchain, truly open and collaborative cooperation is possible, both from a business and technology viewpoint.  For example, why wouldn’t one of the new Fortune 500 companies such as a Social Media company with billions of users, look to a traditional Fortune 500 company deploying an IBM System z Mainframe, to expand their revenue portfolio from being advertising driven, to include service provision, whatever that might be.  Rightly or wrongly, if such a Social Media company is a user’s preferred portal for accessing a plethora of other company resources (E.g. Facebook Login), why wouldn’t this user want to fully process some other business transaction (E.g. Financial) via said platform?  However unlikely, maybe Blockchain can truly simplify and expedite Globalisation, for the benefit of users and businesses alike…