Troubleshooting Kafka

Copied

Exception: KafkaError{code=OFFSET_OUT_OF_RANGE,val=1,str="Broker: Offset out of range"}

This happens where Kafka and the consumers get out of sync. Possible reasons are:

Running out of disk space or memory
Having a sustained event spike that causes very long processing times, causing Kafka to drop messages as they go past the retention time
Date/time out of sync issues due to a restart or suspend/resume cycle

You can visualize the Kafka consumers and their offsets by bringing an additional container, such as Kafka UI or Redpanda Console into your Docker Compose.

Kafka UI:

Copied

kafka-ui:
  image: provectuslabs/kafka-ui:latest
  restart: on-failure
  environment:
    KAFKA_CLUSTERS_0_NAME: "local"
    KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS: "kafka:9092"
    DYNAMIC_CONFIG_ENABLED: "true"
  ports:
    - "8080:8080"
  depends_on:
    - kafka

Or, you can use Redpanda Console:

Copied

redpanda-console:
  image: docker.redpanda.com/redpandadata/console:latest
  restart: on-failure
  entrypoint: /bin/sh
  command: -c "echo \"$$CONSOLE_CONFIG_FILE\" > /tmp/config.yml; /app/console"
  environment:
    CONFIG_FILEPATH: "/tmp/config.yml"
    CONSOLE_CONFIG_FILE: |
      kafka:
        brokers: ["kafka:9092"]
        sasl:
          enabled: false
      schemaRegistry:
        enabled: false
      kafkaConnect:
        enabled: false
  ports:
    - "8080:8080"
  depends_on:
    - kafka

Ideally, you want to have zero lag for all consumer groups. If a consumer group has a lot of lag, you need to investigate whether it's caused by a disconnected consumer (e.g., a Sentry/Snuba container that's disconnected from Kafka) or a consumer that's stuck processing a certain message. If it's a disconnected consumer, you can either restart the container or reset the Kafka offset to 'earliest.' Otherwise, you can reset the Kafka offset to 'latest.'

Warning

These solutions may result in data loss for the duration of your Kafka event retention (defaults to 24 hours) when resetting the offset of the consumers.

The proper solution is as follows (reported by @rmisyurev). This example uses snuba-consumers with events topic. Your consumer group name and topic name may be different.

Shutdown the corresponding Sentry/Snuba container that's using the consumer group (You can see the corresponding containers by inspecting the docker-compose.yml file):
Copied
docker compose stop snuba-errors-consumer snuba-outcomes-consumer snuba-outcomes-billing-consumer
Receive consumers list:
Copied
docker compose run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --list
Get group info:
Copied
docker compose run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --group snuba-consumers --describe
Watching what is going to happen with offset by using dry-run (optional):
Copied
docker compose run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --group snuba-consumers --topic events --reset-offsets --to-latest --dry-run
Set offset to latest and execute:
Copied
docker compose run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --group snuba-consumers --topic events --reset-offsets --to-latest --execute
Start the previously stopped Sentry/Snuba containers:
Copied
docker compose start snuba-errors-consumer snuba-outcomes-consumer snuba-outcomes-billing-consumer

Tips

You can replace snuba-consumers with other consumer groups or events with other topics when needed.
You can reset the offset to "earliest" instead of "latest" if you want to start from the beginning.
If you have Kafka UI or Redpanda Console, you can reset the offsets through the web UI instead of the CLI.

This option is as follows (reported by @gabn88):

Set offset to latest and execute:
Copied
docker compose run --rm kafka kafka-consumer-groups --bootstrap-server kafka:9092 --all-groups --all-topics --reset-offsets --to-latest --execute

Unlike the proper solution, this involves resetting the offsets of all consumer groups and all topics.

The nuclear option is removing all Kafka-related volumes and recreating them which will cause data loss. Any data that was pending there will be gone upon deleting these volumes.

Stop the instance:
Copied
docker compose down --volumes
Remove the the Kafka volume:
Copied
docker volume rm sentry-kafka
Run the install script again:
Copied
./install.sh
Start the instance:
Copied
docker compose up --wait

If you want to reduce the disk space used by Kafka, you'll need to carefully calculate how much data you are ingesting, how much data loss you can tolerate and then follow the recommendations on this awesome StackOverflow post or this post on our community forum.

You could, however, add these on the Kafka container's environment variables (by @csvan):

Copied

services:
  kafka:
    # ...
    environment:
      KAFKA_LOG_RETENTION_HOURS: 24
      KAFKA_LOG_CLEANER_ENABLE: true
      KAFKA_LOG_CLEANUP_POLICY: delete

Troubleshooting Sentry

Troubleshooting Docker

Was this helpful?

Help improve this content
Our documentation is open source and available on GitHub. Your contributions are welcome, whether fixing a typo (drat!) or suggesting an update ("yeah, this would be better").

How to contribute | Edit this page | Create a docs issue | Get support

Troubleshooting Kafka

Offset Out Of Range Error

Visualize

Recovery

Warning

Proper solution

Tips

Another option

Nuclear option

Reducing disk usage