跳到主要内容

微服务的熔断和降级

基础概念类面试题

1. 什么是熔断机制?请说说熔断的三种状态及其转换条件

考察点: 基础概念理解、状态机原理

回答要点: 熔断是一种故障处理机制,用于防止故障的蔓延和级联效应。熔断器有三种状态:

  • 关闭(Closed): 正常状态,记录调用失败次数,当失败率达到预设阈值时切换到打开状态
  • 打开(Open): 熔断状态,直接返回错误,启动超时计时器
  • 半打开(Half-Open): 探测状态,允许少量请求通过,根据结果决定是恢复还是继续熔断

2. 降级和熔断的区别是什么?

考察点: 概念区分、使用场景理解

回答要点:

  • 熔断: 主动保护机制,防止故障扩散,当检测到下游服务异常时主动切断调用
  • 降级: 被动应对策略,在服务不可用时提供备用功能或数据,保证核心功能可用

实现原理类面试题

3. 请设计一个简单的熔断器,并用 Go 代码实现

考察点: 代码设计能力、并发安全、状态管理

package main

import (
"errors"
"sync"
"time"
)

type State int

const (
StateClosed State = iota
StateOpen
StateHalfOpen
)

type CircuitBreaker struct {
mutex sync.RWMutex
state State
failureCount int
successCount int
requestCount int
maxFailures int
timeout time.Duration
resetTime time.Time
halfOpenMax int
}

func NewCircuitBreaker(maxFailures int, timeout time.Duration) *CircuitBreaker {
return &CircuitBreaker{
maxFailures: maxFailures,
timeout: timeout,
state: StateClosed,
halfOpenMax: 3,
}
}

func (cb *CircuitBreaker) Call(fn func() error) error {
cb.mutex.Lock()
defer cb.mutex.Unlock()

// 检查是否需要状态转换
cb.checkState()

switch cb.state {
case StateOpen:
return errors.New("circuit breaker is open")
case StateHalfOpen:
if cb.requestCount >= cb.halfOpenMax {
return errors.New("too many requests in half-open state")
}
cb.requestCount++
case StateClosed:
// 正常执行
}

err := fn()
if err != nil {
cb.onFailure()
} else {
cb.onSuccess()
}

return err
}

func (cb *CircuitBreaker) onFailure() {
cb.failureCount++
if cb.state == StateHalfOpen {
cb.state = StateOpen
cb.resetTime = time.Now().Add(cb.timeout)
} else if cb.failureCount >= cb.maxFailures {
cb.state = StateOpen
cb.resetTime = time.Now().Add(cb.timeout)
}
}

func (cb *CircuitBreaker) onSuccess() {
if cb.state == StateHalfOpen {
cb.successCount++
if cb.successCount >= cb.halfOpenMax {
cb.reset()
}
}
}

func (cb *CircuitBreaker) checkState() {
if cb.state == StateOpen && time.Now().After(cb.resetTime) {
cb.state = StateHalfOpen
cb.requestCount = 0
cb.successCount = 0
}
}

func (cb *CircuitBreaker) reset() {
cb.state = StateClosed
cb.failureCount = 0
cb.successCount = 0
cb.requestCount = 0
}

4. 在微服务架构中,如何实现优雅的服务降级?

考察点: 架构设计、降级策略、系统可用性

时序图示例:

降级策略实现:

type FallbackStrategy interface {
Execute() (interface{}, error)
}

type CacheFallback struct {
cache Cache
key string
}

func (c *CacheFallback) Execute() (interface{}, error) {
return c.cache.Get(c.key)
}

type DefaultValueFallback struct {
defaultValue interface{}
}

func (d *DefaultValueFallback) Execute() (interface{}, error) {
return d.defaultValue, nil
}

type ServiceCaller struct {
fallbacks []FallbackStrategy
}

func (s *ServiceCaller) CallWithFallback(fn func() (interface{}, error)) (interface{}, error) {
result, err := fn()
if err == nil {
return result, nil
}

// 依次尝试降级策略
for _, fallback := range s.fallbacks {
if result, err := fallback.Execute(); err == nil {
return result, nil
}
}

return nil, errors.New("all fallback strategies failed")
}

框架集成类面试题

5. 如何在 gRPC 中集成熔断机制?请写出拦截器实现

考察点: gRPC 拦截器、中间件设计

func CircuitBreakerUnaryInterceptor(cb *CircuitBreaker) grpc.UnaryClientInterceptor {
return func(ctx context.Context, method string, req, reply interface{},
cc *grpc.ClientConn, invoker grpc.UnaryInvoker, opts ...grpc.CallOption) error {

return cb.Call(func() error {
return invoker(ctx, method, req, reply, cc, opts...)
})
}
}

// 使用示例
func createGRPCClient() *grpc.ClientConn {
cb := NewCircuitBreaker(5, 30*time.Second)

conn, err := grpc.Dial("localhost:50051",
grpc.WithInsecure(),
grpc.WithUnaryInterceptor(CircuitBreakerUnaryInterceptor(cb)),
)
if err != nil {
panic(err)
}

return conn
}

6. 使用 Hystrix-go 实现熔断和降级的完整流程是什么?

考察点: 第三方库使用、配置参数理解

代码实现:

func setupHystrix() {
hystrix.ConfigureCommand("user_service", hystrix.CommandConfig{
Timeout: 1000, // 超时时间1秒
MaxConcurrentRequests: 100, // 最大并发数
RequestVolumeThreshold: 20, // 熔断判断最小请求数
ErrorPercentThreshold: 50, // 错误率阈值50%
SleepWindow: 5000, // 熔断后休眠5秒
})
}

func callUserService(userID string) (*User, error) {
var user *User
var err error

hystrix.Do("user_service", func() error {
user, err = userServiceClient.GetUser(userID)
return err
}, func(err error) error {
// 降级逻辑:返回默认用户或从缓存获取
user = getDefaultUser()
return nil
})

return user, err
}

高级应用类面试题

7. 在高并发场景下,如何设计多级降级策略?

考察点: 系统设计、性能优化、容错机制

多级降级实现:

type MultiLevelFallback struct {
levels []FallbackLevel
}

type FallbackLevel struct {
Name string
Strategy FallbackStrategy
Timeout time.Duration
}

func (m *MultiLevelFallback) Execute(ctx context.Context) (interface{}, error) {
for i, level := range m.levels {
levelCtx, cancel := context.WithTimeout(ctx, level.Timeout)
defer cancel()

result, err := level.Strategy.ExecuteWithContext(levelCtx)
if err == nil {
log.Printf("Fallback succeeded at level %d: %s", i, level.Name)
return result, nil
}

log.Printf("Fallback failed at level %d: %s, error: %v", i, level.Name, err)
}

return nil, errors.New("all fallback levels failed")
}

8. 分布式系统中如何避免雪崩效应?

考察点: 分布式系统理解、故障隔离

雪崩效应流程:

防雪崩策略:

  1. 资源隔离
  2. 限流
  3. 熔断
  4. 降级
  5. 超时控制

9. 如何监控和度量熔断器的效果?

考察点: 监控体系、指标设计

关键指标:

type CircuitBreakerMetrics struct {
TotalRequests int64 // 总请求数
SuccessRequests int64 // 成功请求数
FailedRequests int64 // 失败请求数
TimeoutRequests int64 // 超时请求数
CircuitOpenTime time.Time // 熔断开启时间
State string // 当前状态
ErrorRate float64 // 错误率
ResponseTime time.Duration // 平均响应时间
}

func (cb *CircuitBreaker) GetMetrics() CircuitBreakerMetrics {
cb.mutex.RLock()
defer cb.mutex.RUnlock()

total := cb.successCount + cb.failureCount
errorRate := 0.0
if total > 0 {
errorRate = float64(cb.failureCount) / float64(total)
}

return CircuitBreakerMetrics{
TotalRequests: int64(total),
SuccessRequests: int64(cb.successCount),
FailedRequests: int64(cb.failureCount),
State: cb.state.String(),
ErrorRate: errorRate,
}
}

10. 在微服务网格(Service Mesh)中如何实现熔断和降级?

考察点: Service Mesh 理解、Istio/Envoy 知识

Istio 配置示例:

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: user-service-circuit-breaker
spec:
host: user-service
trafficPolicy:
outlierDetection:
consecutiveErrors: 3 # 连续错误3次
interval: 30s # 检测间隔
baseEjectionTime: 30s # 基础驱逐时间
maxEjectionPercent: 50 # 最大驱逐百分比
connectionPool:
tcp:
maxConnections: 100 # 最大连接数
http:
http1MaxPendingRequests: 50 # 最大等待请求数
maxRequestsPerConnection: 10 # 每连接最大请求数

流量管理时序图:

实际场景类面试题

11. 电商系统中订单服务如何实现降级策略?

考察点: 业务理解、实际应用

业务降级流程:

12. 如何处理熔断恢复时的流量冲击?

考察点: 系统稳定性、流量控制

渐进式恢复策略:

type GradualRecovery struct {
maxRecoveryRequests int
recoveryStep int
currentLimit int
mutex sync.RWMutex
}

func (gr *GradualRecovery) AllowRequest() bool {
gr.mutex.RLock()
defer gr.mutex.RUnlock()

// 渐进式增加允许的请求数
return rand.Intn(100) < gr.currentLimit
}

func (gr *GradualRecovery) OnSuccess() {
gr.mutex.Lock()
defer gr.mutex.Unlock()

if gr.currentLimit < 100 {
gr.currentLimit += gr.recoveryStep
if gr.currentLimit > 100 {
gr.currentLimit = 100
}
}
}