MCF > Operations >

On-Orbit Fault Recovery

Scope and Description

This topic page covers on-orbit fault recovery for smallsat missions. Responding to a fault detected during operations involves isolation, diagnosis, and response. The goal of on-orbit fault recovery is to use telemetry and in-depth knowledge of the spacecraft system to determine to root-cause for any faults and to take actions to both recover from the fault and prevent it from recurring. Smallsats are typically much less reliable than traditional, larger spacecraft. In the interest of low-cost and rapid development, it is often useful to shift focus from spacecraft reliability to spacecraft resiliency. On-orbit fault recovery is an essential component of this resiliency.

Resources under this topic area are primarily articles on specific approaches to spacecraft fault protection and response.

Best Practices and Lessons Learned

  • Automated anomaly detection should be leveraged to reduce manned operations wherever possible, especially for risk-tolerant systems. This will help to minimize operations cost to a level commensurate with the value of the space system itself. For small systems comprising relatively few satellites in a constellation, operations can be managed by a single individual. However, automation and minimally staffed operations are only effective when a system is behaving nominally.
  • Have a flatsat available during operations to help debug issues and test out solutions.
  • Trending of telemetry is a powerful tool for identifying the cause of on-orbit faults. Develop tools for this before you need them to expedite fault recovery and build in automatic checks for off-nominal parameters.
  • Bring in extra expertise for fault recovery support. Experience is incredibly valuable in diagnosing the root cause and determining the best path forward .
  • In low Earth orbit, passes above the poles and over the South Atlantic Anomaly are where single event effects are most likely to take place. Check for correlation of faults with these events when attempting to track down the root cause.


White Paper

This NASA handbook provides detailed information on the processes, requirements, design, and assessment ... of fault management. Specifically, Chapter 4 discusses the operations and maintenance of a satellite in orbit and the responsibilities of a Fault Management engineer.

Brian K. Muirhead et al.

This presentation provides a high-level overview of the NASA Fault Management (FM) handbook and discusses ... Health Management (HM) and FM for spacecraft. Additional resource links can be found at the end of the presentation slides for more details on FM and HM

White Paper
JPL/Cal Tech

This paper discusses the harsh conditions a spacecraft faces throughout its lifetime and the necessary ... fault management required to prepare for potential subsystem failures. With the myriad of reasons for what could go wrong with a spacecraft, this document provides various fault protection methods and examples.

Fabrizio Stesina et al.

This Aerospace journal paper explains that many cubesat mission failures are not thoroughly investigated, ... leading to a lack of data. It is often difficult to analyze onboard failures because satellites are not properly tested and have poor or no Failure Detection Isolation and Recovery (FDIR) functions. Techniques outlined in this paper can be applied to improve the chances of spacecraft recovery in the event of a failure.

Get Involved