How to do backup and disaster recovery in Kubernetes

When it comes to backup and disaster recovery, there are several ways to snapshot applications and app data in Kubernetes. Let’s take a closer look at each approach for pros and cons. 

etcd backup


  • Etcd backups are full k8s manifest backups hence application level backup is not possible. So it’s either all or nothing backup
  • Also etcd backup will not take application data backup, hence additional tooling is required

Exporting kubernetes manifests manually

  • While this solution address application level backups issue, it’s manual an requires additional tooling to automate
  • Also application data backups are not possible

Storing in all kubernetes manifests in Git before applying

This is a preferred approach for repeatable deployments and backing up manifests but has some edge cases. let’s discuss the edge cases

  • Application data backups are not possible 
  • If there is a failed deployment just after you store generated manifest in git there should be additional mechanism to capture, revert or mark manifests as unstable release. This leads into creating special logic for handling different failed scenarios
  • Additional tooling/pipeline is required to generate manifests and store in Git
  • Also application data backups are not possible

Note: 

  • In all the above cases application data and k8s manifests mapping need to be tracked
  • Some of the attributes like image SHA (which is a true representation of immutability on a docker image) are only avaialble (in most cases) after application is deployed. All the methods above do not capture such attributes of a backup

Export and backup all objects in a namespace

Using tools like Velero to export and backup all objects in a namespace is by far my most preferred way to backup and restore Kubernetes workloads.

Why it works:

  • Velero can either take all namespaces backup or specific 
  • You can also filter applications withing namespaces or across namespaces for backup
  • Velero can backup exact state of workloads (ex:- docker image sha) 
  • Automation tools can simply call Velero api to trigger partial or full backups (along with application data)

This method does have some drawbacks:

  • Velero is designed for single tenancy, but ready-only mode for multi-clusters
  • No UI

Using CAPE by Biqmind

It was with all this in mind that we developed the disaster recovery component for CAPE, our tool for advanced K8s multi-cluster application & data management functionality. What does CAPE do then?

  • Designed for Single and Multi-Cluster: CAPE fills the gap and makes Velero work across multiple clusters
  • K8s without the steep learning curve: We know it’s difficult, so we built CAPE with a unifed console with a rich UX to make operations life easy in a secure way

I presented a webinar where I delve deeper about using Velero for disaster recovery and data migration – check it out on our YouTube channel.

Chakradhar Jonagam

CHakradhar Jonagam

Biqmind Head Software Architect and CNCF Ambassador

Chak has over 10 years of experience helping customers across the US and APAC unlock the value of cloud and related technologies for their businesses. Previously at Red Hat as a Senior Solutions Architect, Emerging Technologies practice, he worked with enterprise clients across the region on their solutions architectures and technical solution implementation strategies. Chak is also a Cloud Native Compute Foundation(CNCF) ambassador and passionate community advocate. He has presented at Devnation Federal, Red Hat Summit and organized meetups around the world.