Hub & Scope Refactoring

Historically Sentry's SDKs implemented the Hubs and Scopes models, a major defining aspect of the Unified API. This introduced a lot of complexity in a user facing manner, allowed more room for error from the user as well as SDK maintainers. Since looking at adopting OpenTelemetry "packages", to replace or in addition to, our existing performance packages in SDKs, there is a forcing factor. Sentry's model being different to OpenTelemetry's prevents us from supporting OpenTelemetry.

A) Why Are We Doing This?

There are two reasons we are doing this:

  1. Being compatible with OpenTelemetry (OTel) performance data.
  2. Make Sentry performance instrumentation easier to understand and get right.

Being Compatible to OpenTelemetry

What Sentry calls a Scope is called Context in OTel. Contexts in OTel are immutable and thus whenever a context is mutated in OTel a new context is created (forked from the currently active scope). Whenever a new span is created in Otel a new context is forked. This leads to a lot of nested contexts and the Sentry scopes should be able to reproduce this. This is necessary because the Sentry SDKs should be able to capture all spans from OTel with the correct data from the correct context applied to them before sending them to Sentry.

Make Sentry Performance Instrumentation Easier

Hubs and scopes is a complicated system, that can easily be misused (also see here). It requires us to create new hubs which in turn breaks breadcrumbs and other features. For the future we want to have one consistent system, that is easy to understand and use.


B) What Is the Outcome of the Refactoring?

  1. The SDKs model of Scope forking is aligned with the one from OTel. Paving the way for making it possble to use OTel tooling for performance monitoring and still see their data in Sentry. (And Sentry error events are still connected to the spans captured by OTel).
  2. The new Scopes API is implemented in the API and can be used.
  3. The old Hub based API still works as expected for backwards compatibility. After a transition phase the Hub will be removed in a major update of the SDK.

This is why we started this journey with an RFC.


C) The New Concepts

The Hub will be removed and only the Scope, the Client, and the Transport of the Unified API will remain.

Transport

The Transport is not touched. It will not change.

Client

The Client stays the same. The only difference will be that there is always a client available. More on that later.

  • TODO: add more explanation about how that there is now always a client.
  • TODO: explain which scope the client should be stored on (and why)

Scope

This is where the most work needs to be done. The Scope will evolve and take over some functionality of the Hub. The scope now comes in three flavors:

  • Global Scope
  • Isolation Scope
  • Current Scope

No matter the flavor of the scope, you still can add data (like tags, breadcrumbs, attachments, user, profile, ..) to it like before. The scope still holds the propagation context containing tracing information.

The difference is in how the scope data is applied to events.

Global Scope

This is a simple global variable that is the same for the whole execution of the software. Data applied to this scope will end up in all events sent from the software. So the same for all users, threads, async tasks, everything.

This scope is probably only used for process-wide data like the release.

Isolation Scope

This scope holds data only applicable to the current request (on a server), tab (in a browser), or user (on a mobile). The top-level APIs that manipulate data (like sentry_sdk.set_tag(), sentry_sdk.set_context() etc) write to the isolation scope (at least once the migration phase is over, see "Backwards Compatibility").

The isolation scope is stored in a context variable, thread local, async local, or something similar (depending on the platform). It may also be stored on OTel Context so we can rely on OTels Context propagation once SDKs implement POTEL.

The isolation scope is forked by our integrations, end users should not need to think about isolation scopes or forking of one (See diagram below.).

SDKs should not fork isolation scope more than necessary as otherwise we'd bring back the problem that isolation scopes are meant to solve.

Current Scope

This scope holds data for the currently active span. Whenever a new span is started the current scope of the parent span is forked (read: duplicated) giving the new span all the data from the parent span and making it possible to add/manipulate data that is just applied to the new span (see also "Copy-on-write").

Changing the original scope after forking does not modify the forked scope.

The current scope is stored in a context variable, thread local, async local, or something similar (depending on the platform). It may also be stored on OTel Context so we can rely on OTels Context propagation once SDKs implement POTEL.

The current scope can be forked by the end user. Either explicitly (e.g. using Sentry.withScope() or implicitly by starting a new span. (See diagram below.)

Copy-on-write

Getting rid of hubs also implies a soft copy-on-write.

In our current system with hubs and scopes, Scope is always mutable and what makes it somewhat immutable is pushing/popping scopes as well as creating new hubs. This however has to be done manually.

If an async operation is started halfway in between handling a request, a new hub clone has to be created to achieve copy-on-write. In case we don't create a new hub clone, popping a Scope also affects the original execution, leading to lost data or worst case exceptions.

After merging hubs and scopes, scopes can be forked at any time and a user no longer has to take care of it, as it should automatically do the right thing. With this new system, the async execution can manipulate scope without affecting other execution.

This soft copy-on-write relies on Scope being forked in certain cases:

  • withScope and similar API wraps the callback in a forked Scope that'll be cleaned up by the SDK after the callback is finished
  • we also have to fork, when OTel does, however there's some grey area here at the moment
    • it depends on what API is available in the OTel implementation relevant for each SDK
    • in some cases, we may only be able to intercept Context storage, i.e. when an already forked Context is being stored, e.g. in a ThreadLocal variable.
    • in some cases we can intercept Context forking, i.e. when a mutated copy of a Context is created
    • in some cases it might be necessary to have a mix of both, since not all auto instrumentation and user instrumentation go through a single place in the same way. e.g. in Java SDK we're sometimes handed a Context with some properties already set into our ContextStorage where we can then wrap it and intercept future forks of that Context
    • {open question} do we need to fork every time OTel Context copied or only when a new OTel span is set

While it's still possible to misuse, this is a sort of soft copy-on-write that should protect most users.


D) Backwards compatibility

Migrating from Hubs and Scopes to Scopes Only

There has to be some period of time, where users can keep using the old API. We do not want a big bang change, where we leave behind users who can't easily upgrade. This bears the risk of having to support old versions of the SDK. This means we shouldn't have a major version A of an SDK that has hubs and then the next version that has them merged no longer supports hubs. There has to be a migration phase for our users.

For the migration phase we can have top level static API (like Sentry.setTag()) write to both current and isolation scope. In a later major version we can then change this to only write to isolation scope. Hub API should be shimmed and redirect to scopes.

For most use cases writing to isolation scope should be what we want. For other use cases users should mostly be using withScope() already where they use the Scope passed into the callback instead of static API. We're deprecating configureScope to let users know they should reevaluate what scope they want to write to. Users can then get the scope they need and call e.g. setTag on that Scope. There might be some cases where we change behaviour but the damage should be contained to e.g. an incoming server request.

We should make it very clear in the changelog that top level API only writes to isolation scope, after we change it.


E) Implementation Details

We will merge the functionality of the Hub and the Scope of the Unified API into the Scope and we will remove the Hub. We will add some new APIs that make it easier for the user to do custom instrumentation. We will update our auto instrumentation to fork a scope whenever a new span is created. This aligns us with what OTel does.

TODO: add where the propagation context is stored and applied, add how tracing without performance works, where spans/transactions live and other problems we discovered and solved in implementing this in the first two SDKs.

How Is Scope Data Applied to Events?

Data from the different scope flavors is merged before it is applied to the event.

This is done in the following order:

  1. take the data from the global scope
  2. merge in the data from the isolation scope
  3. merge in the data from the current scope
  4. optional: merge in given additional data
  5. apply the merged scope to the event

See the RFC for a code example.

See the diagram below for an illustration of how scope data is applied to events.

What Does the New API Look Like?

There are now two new APIs for forking the current scope or forking the isolation scope (and at the same time the current scope.) They should be called withIsolationScope(callback) and withScope(callback) or something similar.

This image illustrates the behavior of these new APIs and how scope data is applied to scopes:

New Scopes API

For a zoomable version visit the Miro Board

Here's how the new APIs roughly map to the old-style APIs. In case of configure_scope and push_scope it's not clear-cut whether to use the current or the isolation scope. The choice essentially depends on how long/for which context the changed/forked scope should be active.

Old APINew APINote
hub = Hub(Hub.current) (hub clone)with isolation_scope() as scope
with configure_scope() as scopescope = Scope.get_isolation_scope()Should affect the whole transaction (one request/response cycle, one execution of a task, etc.).
scope = Scope.get_current_scope()Should affect the whole span or a part of code isolated via a forked current scope with new_scope.
with push_scope() as scopewith isolation_scope() as scopeWe're starting a new transaction-worthy unit (request/response cycle, etc.).
with new_scope() as scopeShould affect a localized part of code, e.g. if the user wants to add specific data to a specific event.

F) Use OTel for Performance Instrumentation (POtel)

For more information like the goals (and non-goals) see this GitHub Issue:

https://github.com/getsentry/team-sdks/issues/4

Isolation Scope

Without an isolation scope, writing to the current Scope can lead to unexpected results. Writes can go to a scope that's never applied the way a user would expect. While this is already a problem today, it is exaggerated by moving over to OTel, as OTel auto instrumentation can do lots of context forking, leading to deeply nested scopes. Some hooks are automatically wrapped in a new span by OTel, causing it to also have a separate Scope, so the outer Scope can't be manipulated.

With isolation scope, we have a place where e.g. hooks can write to that should affect e.g. the whole request.

When isolation scope is forked, the SDK should also fork current scope at the same time. Otherwise users would have to e.g. call both withScope and withIsolationScope leading to less readable code.

Implementation Details

OTel Context

OTel stores spans as well as other information in a Context and takes care of propagating this Context through libraries and threads. We want to rely on this propagation as it allows us to not worry about propagation, instead we just store our scopes in Context and trust OTel to propagate it correctly.

Can't we simply propagate our Scopes ourselves and ignore OTEL Context? If we didn't want to rely on OTels Context propagation, we'd have to provide a counterpart for every OTel propagation mechanism. We would no longer be able to just rely on OTel for instrumentations but have to mirror some of what they do. This would also have to be kept up to date. Whenever OTel changes something about how they propagate Context we'd have to adapt as well. Keeping Context and Scopes in sync would be hard and possibly also hard to detect. Version compatibility between Sentry and OTel could also become difficult.

Examples of where we had to provide our own propagation before POTel:

Hook

In order to fork our scopes, when required by OTel, we need some sort of a hook. We're not yet completely sure, when forking is actually required. In theory it should suffice to fork scopes, whenever a new OTel span is created, or maybe even only when the span is saved. What hooks are available in OTel depends on the language, so this will likely differer from SDK to SDK.

  • Context Forking: Whenever a modified copy of an OTel Context is created using context.with(...).
  • Context Storing: Whenever an OTel Context is stored using context.makeCurrent()
  • Span Creation: Whenever a new OTel span is created, regardless of it being stored in any Context or that Context being stored.

For Context forking / storing we could filter changes to only fork scopes when the span changes in order to have fewer forked scopes.

If no hook is available, we have to fall back to a more complex variant of global storage. See the middle example in this miro board.

A Context fork/storage hook alone unfortunately does not allow us to do everything we need. In SpanProcessor.onEnd and by extension also SpanExporter.export, Context.current() returns the parent Context and not the Context that was active during span execution. This seems to be by design in OTel. This means even if we managed to add our Scopes to Context, we can't access it in SpanProcessor.onEnd().

Storing Scopes which Belong to a Certain Span

If your language allows you to modify the OTel span and add Sentry Scopes to the span, that's probably the easiest way of getting scopes into SpanProcessor.onEnd and by extension SpanExporter.export. However we haven't explored this variant in detail yet.

If modifying the span isn't possible, the only way to retrieve the Scopes in SpanExporter.export, we've found so far, is to use a global storage. For this global storage, we use OTel spans as weakly referenced keys and Sentry Scopes as values. We use weak references, so we don't cause memory leaks, e.g. in case a span is never finished.

The hook for OTel span creation / Context forking / storing then needs to look up Sentry Scopes in either the span (if the language supports this variant) or the global storage.

Scopes forked in SpanProcessor.onStart can be found via the OTel span in Context (or global storage if span can't be modified). However, there may already be a Scopes object on the Context. The hook has to figure out which Scopes object to fork. An ancestry check can be used to see if the Scopes from Context is an ancestor of the Scopes of the span (or from global storage). If so, we fork the Scopes from the span (or global storage), since it's the "youngest". If that's not the case, we fork Scopes from Context or fall back to forking root scopes, if there's no Scopes on the Context.

We can't rely on Context.current() in SpanProcessor.onEnd as it will either return the parent Context or an unrelated Context. In SpanExporter.export we also can't use Context.current() because that's usually batched and may run on a different thread where Context isn't even propagated to.

Some of the variants for storing Scopes can be seen on this miro board. The variant on the right is preferred, if the SDK can modify spans in SpanProcessor.onStart. The variant in the middle is a fallback, if there's no hooks available. The variant on the left is what we use, when there's hooks but we can't modify spans in SpanProcessor.onStart.

Sentry API that interacts with OTel

Some of the Sentry API will have to create and manage modified copies of OTel Context. In withScope for example, we create a copy of the current Context and set forked Scopes on it using ctx = Context.current().with(forkedScopes). We then invoke the callback passed to withScope and afterwards restore previous Context, e.g. by calling .close() on a lifecycle management object handed to us by OTel.

When to fork Isolation Scope

Untested: We can use Propagator.extract to fork isolation scope. We try to read the Scopes from the Context that's passed into Propagator.extract and fork that or fall back to forking root scopes if there's no Scopes in the Context.

Propagator.extract should be called by auto instrumentation for server and consumer use cases.

[If forking in Propagator.extract doesn't work out, we can try and check if a span has a parent in the current process (span.isRemote) and create an isolation scope if not.]

Tracing Without Performance

Untested: In Propagator.extract we can create PropagationContext from incoming headers (or similar metadata) or fall back to creating a new PropagationContext with random IDs. We then store this PropagationContet on the isolation scope. In Propagator.inject and when sending events to Sentry, we can use that PropagationContext from isolation scope and generate headers (or similar).

  • tbd: how does freezing DSC/baggage work? Can we simply freeze whenever the first request (or similar) goes out?
  • tbd: should Sentry.continueTrace write to isolation scope? Would it then also need to always fork an isolation scope at the same time? Should it create a new span (in case performance is enabled)?

Where to Store Sentry Span

Until we can completely remove Sentry Span and solely rely on OTel spans, we can store Sentry spans on the current scope. Storing it on the isolation scope would allow users to hide the current span by setting a span on the current scope, thereby breaking instrumentation. We'd have to modify isolation scope a lot to maintain which span is currently active - this would imply that the current span is leaked into e.g. async execution where there could be a separate span.

What to Move Along When Execution moves e.g. to Another Thread

When execution moves e.g. to another thread, we should bring along isolation scope and current scope. It may also make sense to have current scope forked in this case. If we're able to rely on OTels Context propagation, this should automatically be taken care of. See the right side of this miro board for examples.

You can edit this page on GitHub.