Cascade Deletion Flow

Understanding the cascade deletion flow.

When some resource is deleted, and the API Server accepts deletion, it means there are no blocking references anywhere. This is ensured. However, there may be resources pointing to deleted ones with asynchronous deletion (or unset).

In these flows we talk only about schema references, meta are fully covered already.

When Deployment deletes some resource, then all Deployments affected by this deletion must take an asynchronous action. It means that if Deployment D0-1 from Service S0 imports Service S1 and S2, and S1 + S2 have deployments D1-1, D1-2, D2-1, D2-2, then D0-1 must make four real-time watches asking for any deletions that it needs to handle! In some cases, I remember service importing five others. If there were 50 regions, it would mean 250 watch instances, but it would be a very large deployment with sufficient resources for goroutines.

Suppose that D1-1 had some resource RX, that was deleted. Following happens:

  • D1-1 must notify all interested deployments that RX is deleted by inspecting back reference sources.
  • Suppose that RX had some back-references in Deployment D0-1, Deployment D1-1 can see that.
  • D1-1, after notifying D0-1, periodically checks if there are still active back-references from D0-1.
  • Deployment D0-1, which points to D1-1 as an importer, is notified about the deleted resource.
  • D0-1 grabs all local resources that need cascade deletion or unset. For unsets, it needs to execute regular updates. For deletions, it needs to delete (or mark for deletion if there are still some other back-references pointing, which may be blocking).
  • Once D0-1 deals with all local resources pointing to RX, it is done, it has no work anymore.
  • At some point, D0-1 will be asked by D1-1 if RX no longer has back refs. If this is the case, then D0-1 will confirm all is clear and D1-1 will finally clean up what remains of RX.

Note that:

  • This deletion spree may be deep for large object deletions, like projects. It may involve multiple levels of Deployments and Services.

  • If there is an error in the schema, some pending deletion may be stuck forever. By error in the schema, we mean situations like:

    • Resource A is deleted, and is back referenced from B and C (async cascade delete).
    • Normally B and C should be deleted, but it may be a problem if C is let’s say blocked by D, and D has no relationship with A, so will never be deleted. In this case, B is deleted, but C is stuck, blocked by D. Unfortunately as of now Goten does not detect weird errors in schema like this, perhaps it may be a good idea, although not sure if possible.
    • It will be the service developers’ responsibility to fix schema errors.
  • In the flow, D0-1 imports Service to which D1-1 belongs. Therefore, we know that D0-1 knows the full-service schema of D1-1, but not the other way around. We need to consider this in the situation when D1-1 asks D0-1 if RX no longer has back refs.