System Availability and Site Reliability Training

Site Reliability (SRE) and system availability training

Unfortunately, software and hardware are often designed in “silos”.  Many of the system availability problems are due to their interface.   Site reliability (SRE) begins with an integrated view of the system and how it can become unavailable and depends on understanding of small issues that lead to cascading failures.  

  • Other classes designed by people with limited real-world application

    They are teaching concepts that are decades old

  • Other classes show models used very late in development

    These aren't much better than just trending the failures in a spreadsheet

  • Other classes assume Waterfall development

    The software engineers aren't going to respect your models if you don't consider Agile

  • Other classes teach overly complex methods

    Not only are these methods used very late in testing but they require extensive effort to use

  • Our class was designed by the world's leader in software reliability prediction

    Why learn from them you can learn from the expert

  • In our class, you learn to use state-of-the-art predictive models

    In our class learn about state of the art predictive models built from decades of benchmarking defects

  • In our class learn to how to apply models in Agile environment

    Easily integrate your predictions with the Agile program increments

  • Easy-to-use models

    Just answer some questions about your software and how it's developed. Our approach to error budgets and site reliability is really simple.  Look for the small things that can lead to significant issues with system availability.

How our software RAM training is effective 

Compliant

Complies with the IEEE 1633 Recommended Practices for Software Reliability

Knowledge Base

Mission-ready software has been available for over 4 decades, encompassing modeling software and hardware design. 

Early prediction

Predict the error budgets, software failure rate, site reliability, availability early

Cost Effective

A single software reboot may result in negligible downtime. However, if it occurs repeatedly or on every system, it can escalate into a massive site reliability issue. This training class addresses the little things that lead to big downtime.

Site reliability and system availability training modules

Reliable Software 101

Modules 3 and 4 VIRTUAL SELF GUIDED TRAINING
$ 200
  • VIRTUAL SELF - GUIDED
  • Facts and statistics about software failures
  • Vocabulary, industry guidance
  • Vocabulary, industry guidance
  • Key differences between software and hardware failures
  • Modules 1 and 2 are fee

Predict defects and when they will be discovered

VIRTUAL SELF GUIDED TRAINING
$ 750
  • VIRTUAL SELF - GUIDED
  • Predict defects and defect density before code is written
  • Predict the likely discovery rate
  • Predict technical debt
  • Estimate warranty and staffing effort

Predict RAM metrics

VIRTUAL SELF GUIDED TRAINING
$ 500
  • VIRTUAL SELF - GUIDED
  • Prerequisite - Predict defects and when they will be discovered
  • Predict failure rate, MTTF, availability, reliability
  • Sanity check the predictions

System Availability Objectives, budgets, integration with hardware

VIRTUAL SELF GUIDED TRAINING
$ 500
  • Predict RAM metrics is a prerequsite
  • Determine a feasible availability objective for system
  • Determine error budgets
  • Model the software and hardware components as a system
  • Combine the software predictions with the hardware prediction
  • Ensure designs are in sync and error budgets are aligned