site reliability (SRE) and System Availability Training
Unfortunately software and hardware are often designed in “silos”. Many of the system availability problems are due to their interface. Site reliability (SRE) begins with an integrated view of the system and how it can become unavailable. Site reliability depends on understanding of the small issues that lead to cascading failures. The error budget isn’t accurate if the complex interactions between software and hardware aren’t considered. An individual software reboot might have negligible downtime. But if it happens repeatedly or on every system it can role into a massive site reliability issue. This training class addresses these issues.
Reliable Software 101
A great overview for people who are new to SRE. The first 2 modules are free on our training portal.
Predict defects and when they will be discovered
This module is a required requisite for the other system availability and site reliability classes. Predicting the defects that will escape into operation is the core of availability and reliability.
Predict RAM metrics
Once the defect discovery profile is predicted, the other RAM metrics such as failure rate, availability, reliability, MTBF can all be predicted. Learn how to evaluate the RAM predictions for the software.
System Availability, Error Budgets, and System Modeling
In the first part of the training, you’ll learn to determine a feasible availability for a system. This involves understanding the target uptime and reliability for the service or product. You’ll then learn to determine error budgets, which are the allowable failure rates for a system’s components. This helps prevent minor issues from causing major outages. Finally, you’ll learn how to model the hardware and software as a combined system to ensure that their designs are in sync and that the error budgets are aligned.
SRE – Early System Availability Issue Detection and Inter-dependencies
The second part of the course focuses on practical application. You will learn how to find issues with software and hardware early during the system design where they originate, preventing last-minute, costly redesigns. You’ll also learn to identify when hardware changes inadvertently affect the software design and vice versa. A key focus is on understanding how software can cause hardware damage and how a software issue can cause a hardware failure to cascade to an even bigger failure, highlighting the critical inter-dependencies between the two.
Reliable Software 101
Modules 3 and 4 VIRTUAL SELF GUIDED TRAINING-
VIRTUAL SELF - GUIDED
-
Hard facts
-
Vocabulary
-
Industry guidance available for software reliability
-
Overview of models that predict and estimate software reliability models
-
Mapping software to hardware reliability
Predict defects and when they will be discovered
VIRTUAL SELF GUIDED TRAINING-
VIRTUAL SELF - GUIDED
-
Predict the scope/size
-
Predict defect density
-
Predict total testing and operational defects
-
Predict when the defects will be discovered
-
Predict technical debt pile up
-
Predict warranty and staffing effort
Predict RAM metrics
VIRTUAL SELF GUIDED TRAINING-
VIRTUAL SELF - GUIDED
-
Prerequisite - Predict defects and when they will be discovered
-
Predict the failure rate, MTBF, availability and reliability
-
Evaluate against typical ranges
System Availability Objectives, budgets, integration with hardware
VIRTUAL SELF GUIDED TRAINING-
Determine a feasible system availability objective
-
Determine feasible error budgets
-
Model software and hardware interactions
Site Reliability (SRE)
VIRTUAL SELF GUIDED TRAINING-
Learn how to find design issues with software and hardware during systems design
-
Learn how changes in hardware affect software
-
Identify how software can cause damage to hardware
-
Learn how bad software error handling can cause hardware failures to cascade to even bigger failures
Knowledge Base
Mission ready software has 4 decades of modeling software and hardware design availability.
Compliant
The methods taught in this class are recommended by the IEEE 1633 Recommended Practices for Software Reliability.
Flexible
These classes are available as virtual self guided, virtual instructor guided.
Cost effective
Our approach to error budgets and site reliability is real simple. Look for the little things that lead to big problems with system availability.
TERMS & CONDITIONS
As per the terms and conditions page of this website, software training classes are non-refundable.