Skip to content

性能监控

Vafast 提供了零外部依赖的内置监控系统,帮助您追踪请求性能、识别瓶颈。

快速开始

typescript
import { Server, defineRoute, defineRoutes, serve } from 'vafast'
import { withMonitoring } from 'vafast/monitoring'

const routes = defineRoutes([
  defineRoute({
    method: 'GET',
    path: '/',
    handler: () => 'Hello Vafast!'
  })
])

const server = new Server(routes)

// 添加监控
const monitored = withMonitoring(server)

serve({ fetch: monitored.fetch, port: 3000 })

启动后控制台会显示:

✅ Monitoring enabled
Config: { slowThreshold: '1000ms', maxRecords: 1000, samplingRate: 1, excludePaths: [] }

每个请求都会记录:

✅ GET / - 200 (0.52ms)
✅ GET /users - 200 (12.34ms)
❌ GET /not-found - 404 (0.31ms)
⚠️ POST /slow - 200 (🐌 1523.45ms)  // 超过阈值显示慢请求

配置选项

typescript
const monitored = withMonitoring(server, {
  // 是否启用监控,默认 true
  enabled: true,
  
  // 是否输出到控制台,默认 true
  console: true,
  
  // 慢请求阈值(毫秒),超过会显示 🐌,默认 1000
  slowThreshold: 500,
  
  // 最大记录数(环形缓冲区),默认 1000
  maxRecords: 5000,
  
  // 采样率 0-1,默认 1(全部记录)
  // 高流量场景可设置 0.1 只记录 10%
  samplingRate: 1,
  
  // 排除的路径(不记录)
  excludePaths: ['/health', '/metrics', '/favicon.ico'],
  
  // 自定义标签
  tags: { service: 'api', env: 'production' },
  
  // 请求完成回调
  onRequest: (metrics) => {
    // 发送到外部监控系统
    sendToPrometheus(metrics)
  },
  
  // 慢请求回调
  onSlowRequest: (metrics) => {
    console.warn(`⚠️ 慢请求: ${metrics.path} (${metrics.totalTime}ms)`)
    alertSlack(`慢请求警告: ${metrics.path}`)
  }
})

获取监控状态

完整状态

typescript
const status = monitored.getMonitoringStatus()

console.log(status)
// {
//   enabled: true,
//   uptime: 3600000,              // 服务运行时间(毫秒)
//   totalRequests: 15000,
//   successfulRequests: 14500,
//   failedRequests: 500,
//   errorRate: 0.0333,
//   avgResponseTime: 12.45,       // 平均响应时间
//   p50: 8.2,                     // 50% 请求在此时间内完成
//   p95: 45.6,                    // 95% 请求在此时间内完成
//   p99: 120.3,                   // 99% 请求在此时间内完成
//   minTime: 0.5,
//   maxTime: 2500.8,
//   rps: 15.2,                    // 当前每秒请求数
//   statusCodes: {
//     '2xx': 14000,
//     '3xx': 200,
//     '4xx': 300,
//     '5xx': 500,
//     detail: { 200: 13500, 201: 500, 404: 250, 500: 500 }
//   },
//   timeWindows: {
//     last1min: { requests: 150, successful: 145, failed: 5, errorRate: 0.033, avgTime: 10.2, rps: 2.5 },
//     last5min: { requests: 750, successful: 720, failed: 30, errorRate: 0.04, avgTime: 11.5, rps: 2.5 },
//     last1hour: { requests: 9000, successful: 8700, failed: 300, errorRate: 0.033, avgTime: 12.1, rps: 2.5 }
//   },
//   byPath: {
//     '/': { count: 5000, avgTime: 5.2, minTime: 0.5, maxTime: 50.3, errorCount: 0 },
//     '/users': { count: 3000, avgTime: 15.8, minTime: 2.1, maxTime: 200.5, errorCount: 100 },
//     '/posts': { count: 2000, avgTime: 25.3, minTime: 5.2, maxTime: 500.8, errorCount: 50 }
//   },
//   memoryUsage: { heapUsed: '45.23MB', heapTotal: '100.50MB' },
//   recentRequests: [ ... ]       // 最近 5 条请求
// }

时间窗口统计

typescript
// 预设时间窗口
const { last1min, last5min, last1hour } = status.timeWindows

console.log(`最近 1 分钟: ${last1min.requests} 请求, 错误率 ${(last1min.errorRate * 100).toFixed(1)}%`)
console.log(`最近 5 分钟: ${last5min.requests} 请求, 平均 ${last5min.avgTime}ms`)
console.log(`最近 1 小时: ${last1hour.requests} 请求, RPS ${last1hour.rps}`)

// 自定义时间窗口
const last30sec = monitored.getTimeWindowStats(30000)  // 最近 30 秒
const last10min = monitored.getTimeWindowStats(600000) // 最近 10 分钟

console.log(`最近 30 秒: ${last30sec.requests} 请求`)

RPS(每秒请求数)

typescript
// 当前 RPS(基于最近 10 秒)
const rps = monitored.getRPS()
console.log(`当前 RPS: ${rps}`)

// 也可从状态获取
console.log(`当前 RPS: ${status.rps}`)

状态码分布

typescript
const dist = monitored.getStatusCodeDistribution()

console.log(`成功 (2xx): ${dist['2xx']}`)
console.log(`重定向 (3xx): ${dist['3xx']}`)
console.log(`客户端错误 (4xx): ${dist['4xx']}`)
console.log(`服务器错误 (5xx): ${dist['5xx']}`)

// 详细分布
console.log(`200 OK: ${dist.detail[200]}`)
console.log(`201 Created: ${dist.detail[201]}`)
console.log(`404 Not Found: ${dist.detail[404]}`)
console.log(`500 Internal Error: ${dist.detail[500]}`)

按路径统计

typescript
// 获取单个路径统计
const userStats = monitored.getPathStats('/users')

if (userStats) {
  console.log(`/users 路径:`)
  console.log(`  请求数: ${userStats.count}`)
  console.log(`  平均时间: ${userStats.avgTime.toFixed(2)}ms`)
  console.log(`  最小时间: ${userStats.minTime.toFixed(2)}ms`)
  console.log(`  最大时间: ${userStats.maxTime.toFixed(2)}ms`)
  console.log(`  错误数: ${userStats.errorCount}`)
}

// 获取所有路径统计
const { byPath } = status
Object.entries(byPath).forEach(([path, stats]) => {
  console.log(`${path}: ${stats.count} 请求, 平均 ${stats.avgTime}ms`)
})

百分位数

typescript
const { p50, p95, p99 } = status

console.log(`P50: ${p50}ms`)  // 50% 请求在此时间内完成
console.log(`P95: ${p95}ms`)  // 95% 请求在此时间内完成  
console.log(`P99: ${p99}ms`)  // 99% 请求在此时间内完成

// P99 是衡量服务质量的关键指标
// 如果 P99 > 阈值,说明有 1% 的请求体验较差

暴露监控端点

typescript
import { Server, defineRoute, defineRoutes, serve, err } from 'vafast'
import { withMonitoring, type MonitoredServer } from 'vafast/monitoring'

// 创建监控端点路由
function createMetricsRoutes(getServer: () => MonitoredServer) {
  return defineRoutes([
    defineRoute({
      method: 'GET',
      path: '/metrics',
      handler: () => getServer().getMonitoringStatus()
    }),
    defineRoute({
      method: 'GET',
      path: '/metrics/rps',
      handler: () => ({ rps: getServer().getRPS() })
    }),
    defineRoute({
      method: 'GET',
      path: '/metrics/status-codes',
      handler: () => getServer().getStatusCodeDistribution()
    }),
    defineRoute({
      method: 'GET',
      path: '/metrics/path/:path',
      handler: ({ params }) => {
        const stats = getServer().getPathStats(`/${params.path}`)
        if (!stats) {
          throw err.notFound('路径未找到')
        }
        return stats
      }
    }),
    defineRoute({
      method: 'POST',
      path: '/metrics/reset',
      handler: () => {
        getServer().resetMonitoring()
        return { message: '监控数据已重置' }
      }
    })
  ])
}

// 主应用路由
const appRoutes = defineRoutes([
  defineRoute({
    method: 'GET',
    path: '/',
    handler: () => 'Hello Vafast!'
  }),
  defineRoute({
    method: 'GET',
    path: '/users',
    handler: () => [{ id: 1, name: 'Alice' }]
  })
])

// 延迟获取 monitoredServer 的引用
let monitoredServer: MonitoredServer

const allRoutes = [
  ...appRoutes,
  ...createMetricsRoutes(() => monitoredServer)
]

const server = new Server(allRoutes)
monitoredServer = withMonitoring(server, {
  excludePaths: ['/metrics', '/health']  // 排除监控端点自身
})

serve({ fetch: monitoredServer.fetch, port: 3000 })

访问:

  • GET /metrics - 完整监控状态
  • GET /metrics/rps - 当前 RPS
  • GET /metrics/status-codes - 状态码分布
  • GET /metrics/path/users - /users 路径统计
  • POST /metrics/reset - 重置监控数据

高级用法

采样率控制

高流量场景下,记录所有请求可能影响性能。使用采样率:

typescript
const monitored = withMonitoring(server, {
  // 只记录 10% 的请求
  samplingRate: 0.1
})

慢请求告警

typescript
const monitored = withMonitoring(server, {
  slowThreshold: 500,  // 500ms 以上视为慢请求
  
  onSlowRequest: async (metrics) => {
    // 记录日志
    console.error(`[SLOW] ${metrics.method} ${metrics.path} - ${metrics.totalTime.toFixed(2)}ms`)
    
    // 发送告警
    await fetch('https://hooks.slack.com/services/xxx', {
      method: 'POST',
      body: JSON.stringify({
        text: `⚠️ 慢请求告警: ${metrics.path} (${metrics.totalTime.toFixed(0)}ms)`
      })
    })
  }
})

发送到外部监控

typescript
const monitored = withMonitoring(server, {
  onRequest: (metrics) => {
    // 发送到 Prometheus Pushgateway
    fetch('http://prometheus:9091/metrics/job/vafast', {
      method: 'POST',
      body: `http_request_duration_ms{method="${metrics.method}",path="${metrics.path}",status="${metrics.statusCode}"} ${metrics.totalTime}`
    })
    
    // 或发送到 InfluxDB
    fetch('http://influxdb:8086/write?db=metrics', {
      method: 'POST',
      body: `requests,method=${metrics.method},path=${metrics.path},status=${metrics.statusCode} duration=${metrics.totalTime}`
    })
  }
})

便捷创建函数

typescript
import { Server } from 'vafast'
import { createMonitoredServer } from 'vafast/monitoring'

// 一步创建带监控的 Server
const monitored = createMonitoredServer(Server, routes, {
  slowThreshold: 500,
  excludePaths: ['/health']
})

serve({ fetch: monitored.fetch, port: 3000 })

监控仪表盘示例

typescript
import { Server, defineRoute, defineRoutes, serve, html } from 'vafast'
import { withMonitoring, type MonitoredServer } from 'vafast/monitoring'

let monitoredServer: MonitoredServer

const routes = defineRoutes([
  defineRoute({
    method: 'GET',
    path: '/',
    handler: () => 'Hello Vafast!'
  }),
  defineRoute({
    method: 'GET',
    path: '/dashboard',
    handler: () => {
      const status = monitoredServer.getMonitoringStatus()
      
      return html(`
        <!DOCTYPE html>
        <html>
          <head>
          <title>Vafast 监控仪表盘</title>
            <style>
            * { box-sizing: border-box; }
            body { font-family: system-ui; margin: 0; padding: 20px; background: #0f172a; color: #e2e8f0; }
            h1 { color: #38bdf8; }
            .grid { display: grid; grid-template-columns: repeat(auto-fit, minmax(200px, 1fr)); gap: 16px; }
            .card { background: #1e293b; padding: 20px; border-radius: 12px; }
            .card h3 { margin: 0 0 8px 0; color: #94a3b8; font-size: 14px; }
            .card .value { font-size: 32px; font-weight: bold; color: #f8fafc; }
            .card .unit { font-size: 14px; color: #64748b; }
            .success { color: #22c55e; }
            .warning { color: #eab308; }
            .error { color: #ef4444; }
            table { width: 100%; border-collapse: collapse; margin-top: 20px; }
            th, td { padding: 12px; text-align: left; border-bottom: 1px solid #334155; }
            th { color: #94a3b8; }
            </style>
          <meta http-equiv="refresh" content="5">
          </head>
          <body>
          <h1>Vafast 监控仪表盘</h1>
          
          <div class="grid">
            <div class="card">
              <h3>总请求数</h3>
              <div class="value">${status.totalRequests.toLocaleString()}</div>
            </div>
            <div class="card">
              <h3>当前 RPS</h3>
              <div class="value">${status.rps} <span class="unit">req/s</span></div>
            </div>
            <div class="card">
              <h3>错误率</h3>
              <div class="value ${status.errorRate > 0.05 ? 'error' : status.errorRate > 0.01 ? 'warning' : 'success'}">
                ${(status.errorRate * 100).toFixed(2)}%
              </div>
            </div>
            <div class="card">
              <h3>平均响应时间</h3>
              <div class="value">${status.avgResponseTime} <span class="unit">ms</span></div>
            </div>
            <div class="card">
              <h3>P95</h3>
              <div class="value ${status.p95 > 500 ? 'warning' : ''}">${status.p95} <span class="unit">ms</span></div>
            </div>
            <div class="card">
              <h3>P99</h3>
              <div class="value ${status.p99 > 1000 ? 'error' : status.p99 > 500 ? 'warning' : ''}">${status.p99} <span class="unit">ms</span></div>
            </div>
          </div>
          
          <h2>时间窗口统计</h2>
          <table>
            <tr>
              <th>时间窗口</th>
              <th>请求数</th>
              <th>成功</th>
              <th>失败</th>
              <th>错误率</th>
              <th>平均时间</th>
              <th>RPS</th>
            </tr>
            <tr>
              <td>最近 1 分钟</td>
              <td>${status.timeWindows.last1min.requests}</td>
              <td class="success">${status.timeWindows.last1min.successful}</td>
              <td class="error">${status.timeWindows.last1min.failed}</td>
              <td>${(status.timeWindows.last1min.errorRate * 100).toFixed(2)}%</td>
              <td>${status.timeWindows.last1min.avgTime}ms</td>
              <td>${status.timeWindows.last1min.rps}</td>
            </tr>
            <tr>
              <td>最近 5 分钟</td>
              <td>${status.timeWindows.last5min.requests}</td>
              <td class="success">${status.timeWindows.last5min.successful}</td>
              <td class="error">${status.timeWindows.last5min.failed}</td>
              <td>${(status.timeWindows.last5min.errorRate * 100).toFixed(2)}%</td>
              <td>${status.timeWindows.last5min.avgTime}ms</td>
              <td>${status.timeWindows.last5min.rps}</td>
            </tr>
            <tr>
              <td>最近 1 小时</td>
              <td>${status.timeWindows.last1hour.requests}</td>
              <td class="success">${status.timeWindows.last1hour.successful}</td>
              <td class="error">${status.timeWindows.last1hour.failed}</td>
              <td>${(status.timeWindows.last1hour.errorRate * 100).toFixed(2)}%</td>
              <td>${status.timeWindows.last1hour.avgTime}ms</td>
              <td>${status.timeWindows.last1hour.rps}</td>
            </tr>
          </table>
          
          <h2>状态码分布</h2>
          <div class="grid">
            <div class="card">
              <h3>2xx 成功</h3>
              <div class="value success">${status.statusCodes['2xx']}</div>
            </div>
            <div class="card">
              <h3>3xx 重定向</h3>
              <div class="value">${status.statusCodes['3xx']}</div>
            </div>
            <div class="card">
              <h3>4xx 客户端错误</h3>
              <div class="value warning">${status.statusCodes['4xx']}</div>
            </div>
            <div class="card">
              <h3>5xx 服务器错误</h3>
              <div class="value error">${status.statusCodes['5xx']}</div>
            </div>
          </div>
          
          <h2>路径统计 (Top 10)</h2>
          <table>
            <tr>
              <th>路径</th>
              <th>请求数</th>
              <th>平均时间</th>
              <th>最小</th>
              <th>最大</th>
              <th>错误数</th>
            </tr>
            ${Object.entries(status.byPath)
              .sort(([, a], [, b]) => b.count - a.count)
              .slice(0, 10)
              .map(([path, stats]) => `
                <tr>
                  <td>${path}</td>
                  <td>${stats.count}</td>
                  <td>${stats.avgTime.toFixed(2)}ms</td>
                  <td>${stats.minTime.toFixed(2)}ms</td>
                  <td>${stats.maxTime.toFixed(2)}ms</td>
                  <td class="${stats.errorCount > 0 ? 'error' : ''}">${stats.errorCount}</td>
                </tr>
              `).join('')}
          </table>
          
          <h2>内存使用</h2>
          <div class="grid">
            <div class="card">
              <h3>堆内存使用</h3>
              <div class="value">${status.memoryUsage.heapUsed}</div>
            </div>
            <div class="card">
              <h3>堆内存总计</h3>
              <div class="value">${status.memoryUsage.heapTotal}</div>
            </div>
            <div class="card">
              <h3>运行时间</h3>
              <div class="value">${Math.floor(status.uptime / 1000 / 60)} <span class="unit">分钟</span></div>
            </div>
          </div>
          
          <p style="color: #64748b; margin-top: 20px;">页面每 5 秒自动刷新</p>
          </body>
        </html>
      `)
    }
  }),
  defineRoute({
    method: 'GET',
    path: '/api/metrics',
    handler: () => monitoredServer.getMonitoringStatus()
  })
])

const server = new Server(routes)
monitoredServer = withMonitoring(server, {
  excludePaths: ['/dashboard', '/api/metrics']
})

serve({ fetch: monitoredServer.fetch, port: 3000 }, () => {
  console.log('Server running on http://localhost:3000')
  console.log('Dashboard: http://localhost:3000/dashboard')
})

API 参考

MonitoringConfig

属性类型默认值说明
enabledbooleantrue是否启用监控
consolebooleantrue是否输出到控制台
slowThresholdnumber1000慢请求阈值(毫秒)
maxRecordsnumber1000最大记录数
samplingRatenumber1采样率 0-1
excludePathsstring[][]排除的路径
tagsRecord<string, string>{}自定义标签
onRequest(metrics) => void-请求完成回调
onSlowRequest(metrics) => void-慢请求回调

MonitoredServer 方法

方法返回值说明
getMonitoringStatus()MonitoringStatus完整监控状态
getMonitoringMetrics()MonitoringMetrics[]原始指标数据
getPathStats(path)PathStats | undefined单路径统计
getTimeWindowStats(ms)TimeWindowStats自定义时间窗口统计
getRPS()number当前每秒请求数
getStatusCodeDistribution()StatusCodeDistribution状态码分布
resetMonitoring()void重置所有监控数据

MonitoringStatus 字段

字段类型说明
enabledboolean监控是否启用
uptimenumber服务运行时间(毫秒)
totalRequestsnumber总请求数
successfulRequestsnumber成功请求数
failedRequestsnumber失败请求数
errorRatenumber错误率
avgResponseTimenumber平均响应时间
p50numberP50 响应时间
p95numberP95 响应时间
p99numberP99 响应时间
minTimenumber最小响应时间
maxTimenumber最大响应时间
rpsnumber当前 RPS
statusCodesStatusCodeDistribution状态码分布
timeWindows{ last1min, last5min, last1hour }时间窗口统计
byPathRecord<string, PathStats>按路径统计
memoryUsage{ heapUsed, heapTotal }内存使用
recentRequestsMonitoringMetrics[]最近请求

总结

Vafast 内置监控提供:

  • 零外部依赖 - 无需 Prometheus、OpenTelemetry
  • 开箱即用 - 一行代码启用
  • 百分位数统计 - P50/P95/P99
  • 时间窗口统计 - 1分钟/5分钟/1小时
  • RPS 计算 - 实时每秒请求数
  • 状态码分布 - 2xx/3xx/4xx/5xx
  • 按路径统计 - 识别热点和慢端点
  • 内存友好 - 环形缓冲区限制内存
  • 采样率控制 - 高流量场景优化
  • 自定义回调 - 集成外部监控系统

下一步