Embedded systems are often modified remotely, e.g. to upgrade the firmware or change the configuration. This may however break the system and render it inaccessible, which is a major problem if the device is hard to reach physically. Unfortunately, no catch-all failsafe solution exists to make sure that the device stays accessible remotely even if a modification goes wrong. Instead, the possible failures have to be anticipated and covered. This talk discusses some of the frequently occurring failures, how they can be detected and handled. These include power failure, kernel crashes, network failure and data corruption. We include examples of concrete use cases. Finally, there is room for discussion about possible alternative or more generic solutions than the ones proposed.
This talk is geared towards system architects and developers who want to improve the quality of their product.