在微服务架构下,服务实例动辄数十上百,靠
tail -f看日志已成历史。本文以 Spring Boot 3.x 微服务 + ELK Stack(Elasticsearch 8.x + Logstash 8.x + Kibana 8.x)+ Filebeat 为技术底座,从零搭建一套生产级日志全链路平台,涵盖结构化日志输出、日志采集、索引管理、可视化分析,以及与 Prometheus 告警联动,与前文 [Prometheus + Grafana 全链路监控] 形成完整的微服务可观测性三支柱(Metrics + Logs + Traces)体系。
一、为什么微服务场景必须上 ELK?
| 痛点 | 传统方式 | ELK 方案 |
|---|---|---|
| 日志分散在各容器 | 逐一 kubectl logs | 集中汇聚,一站搜索 |
| 排查链路问题 | 多窗口交叉对比 | TraceId 一键检索全链路 |
| 日志存储无上限 | 磁盘爆满被迫删除 | ILM 自动滚动归档 |
| 实时告警 | 无 | ElastAlert2 分钟级触发 |
| 日志格式五花八门 | 无法统一分析 | Logstash Grok/JSON 结构化 |
可观测性三支柱:Metrics(Prometheus+Grafana)+ Logs(ELK) + Traces(SkyWalking/Jaeger)。三者缺一不可,本文补齐 Logs 这一支柱。
二、整体架构
┌─────────────────────────────────────────────────────────────────┐
│ 微服务集群(Docker / K8s) │
│ │
│ [order-service] [user-service] [gateway] ... │
│ │ │ │ │
│ log/*.log log/*.log log/*.log │
│ └────────────────┴───────────────┘ │
│ │ Filebeat(轻量采集) │
└─────────────────────────┼────────────────────────────────────────┘
│
┌───────────▼────────────┐
│ Logstash(过滤/解析) │
└───────────┬────────────┘
│
┌───────────▼────────────┐
│ Elasticsearch 8.x │
│ (存储/索引/搜索) │
└───────────┬────────────┘
│
┌───────────▼────────────┐
│ Kibana(可视化/告警) │
└────────────────────────┘
为什么加 Filebeat? Logstash 是重量级进程(JVM),直接部署在每个容器中资源消耗大。Filebeat 是 Go 编写的轻量采集器,负责读取日志文件并转发给 Logstash,由 Logstash 统一完成解析过滤,职责分离。
三、Spring Boot 3.x 结构化日志配置
3.1 依赖引入
<!-- pom.xml -->
<dependencies>
<!-- Spring Boot Starter(含 Logback) -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!-- logstash-logback-encoder:输出 JSON 格式日志 -->
<dependency>
<groupId>net.logstash.logback</groupId>
<artifactId>logstash-logback-encoder</artifactId>
<version>7.4</version>
</dependency>
<!-- MDC TraceId 自动注入(需配合 Spring Cloud Sleuth 或 Micrometer Tracing) -->
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-tracing-bridge-brave</artifactId>
</dependency>
</dependencies>
3.2 logback-spring.xml 配置
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<!-- 从 application.yml 读取应用名 -->
<springProperty scope="context" name="APP_NAME" source="spring.application.name" defaultValue="app"/>
<springProperty scope="context" name="LOG_PATH" source="logging.file.path" defaultValue="./logs"/>
<!-- 控制台:开发环境可读格式 -->
<appender name="CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
<encoder>
<pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} [%X{traceId}] - %msg%n</pattern>
</encoder>
</appender>
<!-- 文件:JSON 格式,供 Filebeat 采集 -->
<appender name="FILE_JSON" class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>${LOG_PATH}/${APP_NAME}.log</file>
<rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
<!-- 每天滚动,单文件最大 200MB,保留 7 天 -->
<fileNamePattern>${LOG_PATH}/${APP_NAME}-%d{yyyy-MM-dd}.%i.log</fileNamePattern>
<maxFileSize>200MB</maxFileSize>
<maxHistory>7</maxHistory>
<totalSizeCap>2GB</totalSizeCap>
</rollingPolicy>
<encoder class="net.logstash.logback.encoder.LogstashEncoder">
<!-- 添加自定义字段 -->
<customFields>{"app":"${APP_NAME}","env":"${spring.profiles.active:-dev}"}</customFields>
<!-- 包含 MDC(traceId, spanId 等) -->
<includeMdcKeyName>traceId</includeMdcKeyName>
<includeMdcKeyName>spanId</includeMdcKeyName>
<includeMdcKeyName>userId</includeMdcKeyName>
</encoder>
</appender>
<!-- 异步包装,提升吞吐量 -->
<appender name="ASYNC_FILE" class="ch.qos.logback.classic.AsyncAppender">
<discardingThreshold>0</discardingThreshold>
<queueSize>512</queueSize>
<appender-ref ref="FILE_JSON"/>
</appender>
<root level="INFO">
<appender-ref ref="CONSOLE"/>
<appender-ref ref="ASYNC_FILE"/>
</root>
<!-- 业务日志单独输出到独立文件,方便告警 -->
<logger name="biz" level="INFO" additivity="false">
<appender-ref ref="ASYNC_FILE"/>
</logger>
</configuration>
3.3 MDC 全链路追踪 Filter
/**
* 为每条请求注入 traceId 到 MDC,日志自动携带,便于全链路检索
*/
@Component
@Order(Ordered.HIGHEST_PRECEDENCE)
public class TraceIdFilter implements Filter {
private static final String TRACE_ID = "traceId";
private static final String USER_ID = "userId";
@Override
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain)
throws IOException, ServletException {
HttpServletRequest req = (HttpServletRequest) request;
try {
// 优先从请求头获取(网关透传),否则新建
String traceId = Optional.ofNullable(req.getHeader("X-Trace-Id"))
.filter(s -> !s.isBlank())
.orElse(UUID.randomUUID().toString().replace("-", "").substring(0, 16));
MDC.put(TRACE_ID, traceId);
// 从 JWT 解析后的 SecurityContext 获取用户 ID
Optional.ofNullable(SecurityContextHolder.getContext().getAuthentication())
.map(auth -> (String) auth.getPrincipal())
.ifPresent(uid -> MDC.put(USER_ID, uid));
// 响应头回传 traceId,方便前端定位
((HttpServletResponse) response).setHeader("X-Trace-Id", traceId);
chain.doFilter(request, response);
} finally {
MDC.clear(); // 避免线程池复用污染
}
}
}
输出的 JSON 日志示例:
{
"@timestamp": "2026-03-29T08:15:32.456+08:00",
"level": "ERROR",
"logger_name": "com.example.order.service.OrderService",
"message": "创建订单失败:库存不足",
"app": "order-service",
"env": "prod",
"traceId": "a3f5c8d2e1b40967",
"spanId": "b1c2d3e4",
"userId": "10086",
"stack_trace": "java.lang.RuntimeException: 库存不足\n\tat com.example..."
}
四、Docker Compose 部署 ELK Stack
4.1 目录结构
elk/
├── docker-compose.yml
├── elasticsearch/
│ └── config/
│ └── elasticsearch.yml
├── logstash/
│ ├── config/
│ │ └── logstash.yml
│ └── pipeline/
│ └── springboot.conf # 核心 Pipeline
├── kibana/
│ └── config/
│ └── kibana.yml
└── filebeat/
└── filebeat.yml
4.2 docker-compose.yml
version: "3.9"
networks:
elk:
driver: bridge
volumes:
esdata:
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.13.0
container_name: elasticsearch
environment:
- node.name=es01
- cluster.name=elk-cluster
- discovery.type=single-node
- xpack.security.enabled=false # 开发环境关闭认证,生产务必开启
- xpack.security.http.ssl.enabled=false
- ES_JAVA_OPTS=-Xms1g -Xmx1g
- bootstrap.memory_lock=true
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- esdata:/usr/share/elasticsearch/data
- ./elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml
ports:
- "9200:9200"
networks:
- elk
healthcheck:
test: ["CMD-SHELL", "curl -s http://localhost:9200/_cluster/health | grep -q '\"status\":\"green\\|yellow\"'"]
interval: 20s
timeout: 10s
retries: 5
logstash:
image: docker.elastic.co/logstash/logstash:8.13.0
container_name: logstash
volumes:
- ./logstash/config/logstash.yml:/usr/share/logstash/config/logstash.yml
- ./logstash/pipeline:/usr/share/logstash/pipeline
ports:
- "5044:5044" # Beats input(供 Filebeat 推送)
- "9600:9600" # Logstash 监控 API
environment:
- LS_JAVA_OPTS=-Xms512m -Xmx512m
networks:
- elk
depends_on:
elasticsearch:
condition: service_healthy
kibana:
image: docker.elastic.co/kibana/kibana:8.13.0
container_name: kibana
volumes:
- ./kibana/config/kibana.yml:/usr/share/kibana/config/kibana.yml
ports:
- "5601:5601"
networks:
- elk
depends_on:
- elasticsearch
filebeat:
image: docker.elastic.co/beats/filebeat:8.13.0
container_name: filebeat
user: root
volumes:
- ./filebeat/filebeat.yml:/usr/share/filebeat/filebeat.yml:ro
# 挂载宿主机日志目录(所有微服务日志落盘到此处)
- /var/log/microservices:/var/log/microservices:ro
- /var/lib/docker/containers:/var/lib/docker/containers:ro
- /var/run/docker.sock:/var/run/docker.sock:ro
networks:
- elk
depends_on:
- logstash
4.3 Logstash Pipeline 配置(核心)
# logstash/pipeline/springboot.conf
input {
beats {
port => 5044
}
}
filter {
# 解析 JSON 格式(logstash-logback-encoder 输出)
if [message] =~ /^\{/ {
json {
source => "message"
target => "log"
}
# 提升关键字段到顶层
mutate {
rename => {
"[log][app]" => "app"
"[log][env]" => "env"
"[log][traceId]" => "traceId"
"[log][spanId]" => "spanId"
"[log][userId]" => "userId"
"[log][level]" => "level"
"[log][logger_name]" => "logger"
"[log][message]" => "msg"
"[log][stack_trace]" => "stackTrace"
}
remove_field => ["message", "log", "ecs", "agent", "input"]
}
} else {
# 非 JSON 日志(如 Nginx access log)用 Grok 解析
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
}
# 统一时间戳
date {
match => ["[log][@timestamp]", "ISO8601"]
target => "@timestamp"
timezone => "Asia/Shanghai"
}
# 标记慢请求(响应时间 > 2000ms)
if [msg] =~ /duration/ {
grok {
match => { "msg" => "duration=(?<duration_ms>\d+)" }
}
if [duration_ms] and [duration_ms] > "2000" {
mutate {
add_tag => ["slow_request"]
}
}
}
# 过滤无用日志(健康检查、静态资源)
if [msg] =~ /actuator\/health|\.css|\.js|\.ico/ {
drop {}
}
}
output {
elasticsearch {
hosts => ["http://elasticsearch:9200"]
# 按应用 + 月份分索引,方便 ILM 管理
index => "springboot-%{app}-%{+yyyy.MM}"
action => "create"
}
# 调试时打开,生产关闭
# stdout { codec => rubydebug }
}
4.4 Filebeat 配置
# filebeat/filebeat.yml
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/microservices/**/*.log
# 多行日志合并(异常堆栈跨多行)
multiline.type: pattern
multiline.pattern: '^\{'
multiline.negate: true
multiline.match: after
# 添加元数据
fields:
source: "file"
fields_under_root: true
# 从上次读取位置继续,避免重复
close_inactive: 5m
# 同时采集 Docker 容器日志
- type: container
paths:
- /var/lib/docker/containers/*/*.log
processors:
- add_docker_metadata:
host: "unix:///var/run/docker.sock"
output.logstash:
hosts: ["logstash:5044"]
worker: 4
bulk_max_size: 2048
compression_level: 3
# Filebeat 监控(可选)
monitoring.enabled: false
五、Elasticsearch ILM 索引生命周期管理
生产环境日志量极大,必须配置 ILM(Index Lifecycle Management)自动滚动删除,否则磁盘会被撑爆。
# 创建 ILM Policy
curl -X PUT "http://localhost:9200/_ilm/policy/springboot-logs-policy" \
-H "Content-Type: application/json" \
-d '{
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"rollover": {
"max_primary_shard_size": "20gb",
"max_age": "1d"
}
}
},
"warm": {
"min_age": "7d",
"actions": {
"forcemerge": { "max_num_segments": 1 },
"shrink": { "number_of_shards": 1 }
}
},
"cold": {
"min_age": "30d",
"actions": {
"freeze": {}
}
},
"delete": {
"min_age": "90d",
"actions": {
"delete": {}
}
}
}
}
}'
# 创建索引模板,绑定 ILM Policy
curl -X PUT "http://localhost:9200/_index_template/springboot-logs-template" \
-H "Content-Type: application/json" \
-d '{
"index_patterns": ["springboot-*"],
"template": {
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0,
"index.lifecycle.name": "springboot-logs-policy",
"index.lifecycle.rollover_alias": "springboot-logs"
},
"mappings": {
"properties": {
"@timestamp": { "type": "date" },
"app": { "type": "keyword" },
"env": { "type": "keyword" },
"level": { "type": "keyword" },
"traceId": { "type": "keyword" },
"spanId": { "type": "keyword" },
"userId": { "type": "keyword" },
"logger": { "type": "keyword" },
"msg": { "type": "text", "analyzer": "ik_max_word" },
"stackTrace": { "type": "text" }
}
}
}
}'
⚠️ 注意:生产环境
msg字段若要支持中文全文搜索,需安装analysis-ik插件:docker exec -it elasticsearch \ bin/elasticsearch-plugin install \ https://get.infini.cloud/elasticsearch/analysis-ik/8.13.0
六、Kibana 配置与常用分析场景
6.1 创建 Data View
访问 http://localhost:5601 → Stack Management → Data Views → Create data view:
- Index pattern:
springboot-* - Timestamp field:
@timestamp
6.2 常用 KQL 查询语句
# 查询所有 ERROR 级别日志
level: "ERROR"
# 全链路追踪:通过 traceId 聚合所有微服务日志
traceId: "a3f5c8d2e1b40967"
# 查询某用户的所有操作记录
userId: "10086" and level: "INFO"
# 查询订单服务最近 1 小时的异常
app: "order-service" and level: "ERROR"
# 慢请求分析
tags: "slow_request"
# 包含特定异常的日志
stackTrace: "OutOfMemoryError"
6.3 可视化 Dashboard 搭建
推荐创建以下 Panel:
| Panel 名称 | 类型 | 用途 |
|---|---|---|
| 日志级别分布 | Pie Chart | 实时掌握 ERROR/WARN 比例 |
| 各服务每分钟日志量 | Line Chart | 发现流量突刺、服务异常 |
| TOP 10 错误类型 | Bar Chart | 快速定位高频问题 |
| 慢请求 TOP 20 | Data Table | 性能优化指引 |
| 用户操作轨迹 | Data Table | 安全审计 |
| 全链路 TraceId 搜索 | Search | 故障快速定位 |
七、ElastAlert2 日志告警(与 Prometheus 联动)
当日志中 ERROR 激增时,需要立即推送告警。使用 ElastAlert2 监听 Elasticsearch 数据变化触发告警。
7.1 Docker 部署 ElastAlert2
# 在 docker-compose.yml 中追加
elastalert:
image: jertel/elastalert2:2.17.0
container_name: elastalert2
volumes:
- ./elastalert/config.yaml:/opt/elastalert/config.yaml
- ./elastalert/rules:/opt/elastalert/rules
networks:
- elk
depends_on:
- elasticsearch
7.2 告警规则配置
# elastalert/rules/high-error-rate.yaml
name: SpringBoot ERROR 告警
type: frequency # 频率类型:N 分钟内超过 M 次触发
index: springboot-*
# 5 分钟内 ERROR 超过 50 次告警
num_events: 50
timeframe:
minutes: 5
filter:
- term:
level: "ERROR"
# 告警渠道:企业微信 Webhook
alert:
- "post"
alert_text: |
🚨 微服务 ERROR 告警
服务:{0}
5分钟内 ERROR 数量:{1}
最新错误:{2}
时间:{3}
alert_text_args:
- app
- num_hits
- msg
- "@timestamp"
http_post_url: "https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=YOUR_KEY"
http_post_payload:
msgtype: "text"
text:
content: "{{alert_text}}"
八、生产环境最佳实践
8.1 日志规范
// ✅ 推荐:结构化日志,使用占位符,包含上下文
log.error("创建订单失败,orderId={}, userId={}, reason={}",
orderId, userId, e.getMessage(), e);
// ❌ 禁止:字符串拼接(性能差)
log.error("创建订单失败,orderId=" + orderId + ", 错误:" + e.getMessage());
// ✅ 关键业务节点打印 INFO
log.info("订单支付成功,orderId={}, amount={}, payType={}", orderId, amount, payType);
// ✅ 使用 isDebugEnabled 保护高频 Debug 日志
if (log.isDebugEnabled()) {
log.debug("库存缓存命中,skuId={}, stock={}", skuId, stock);
}
8.2 敏感信息脱敏
@Component
public class SensitiveDataMaskingConverter extends ClassicConverter {
private static final Pattern PHONE = Pattern.compile("1[3-9]\\d{9}");
private static final Pattern ID_CARD = Pattern.compile("\\d{15}|\\d{18}");
@Override
public String convert(ILoggingEvent event) {
String msg = event.getFormattedMessage();
msg = PHONE.matcher(msg).replaceAll(m -> m.group().substring(0, 3) + "****" + m.group().substring(7));
msg = ID_CARD.matcher(msg).replaceAll(m -> m.group().substring(0, 4) + "**********" + m.group().substring(14));
return msg;
}
}
在 logback-spring.xml 中注册:
<conversionRule conversionWord="mask"
converterClass="com.example.common.log.SensitiveDataMaskingConverter"/>
8.3 Elasticsearch 性能调优
# elasticsearch.yml 生产配置
cluster.name: elk-prod
node.name: es-node-01
# 内存锁定,防止 swap 导致性能下降
bootstrap.memory_lock: true
# 关闭 swap
# echo "vm.swappiness=1" >> /etc/sysctl.conf
# 索引刷新频率(默认 1s,写入密集时可调大)
index.refresh_interval: 5s
# 副本数(单节点设 0,生产集群设 1+)
index.number_of_replicas: 1
# 写入缓冲区大小
indices.memory.index_buffer_size: 20%
九、完整验证步骤
# 1. 启动 ELK Stack
cd elk && docker-compose up -d
# 2. 检查各服务健康状态
docker-compose ps
curl http://localhost:9200/_cluster/health?pretty
curl http://localhost:9600 # Logstash API
# 3. 启动 Spring Boot 服务,触发若干请求产生日志
curl http://localhost:8080/api/orders
# 4. 验证 Filebeat 采集到日志
docker logs filebeat --tail=50
# 5. 查询 Elasticsearch 是否有数据
curl "http://localhost:9200/springboot-order-service-*/_count"
# 6. 打开 Kibana 验证 Dashboard
open http://localhost:5601
十、总结
| 组件 | 职责 |
|---|---|
| Filebeat | 轻量采集,读取日志文件/容器日志,低资源占用 |
| Logstash | 结构化解析(Grok/JSON)、字段提取、过滤脏数据 |
| Elasticsearch | 分布式存储与全文检索,毫秒级日志检索 |
| Kibana | 可视化分析、Dashboard、DevTools 查询 |
| ElastAlert2 | 基于日志数据的规则告警,对接企业微信/钉钉/邮件 |
至此,结合前文已完成的 [Prometheus+Grafana 监控]、[Spring Cloud Gateway 网关]、[Seata 分布式事务] 等文章,博客的微服务可观测性体系已形成完整闭环:
可观测性三支柱
├── Metrics → Prometheus + Grafana(指标监控与告警)
├── Logs → ELK Stack(日志采集与分析) ← 本文
└── Traces → SkyWalking / Jaeger(分布式链路追踪)
下一篇推荐:SkyWalking 9.x 接入 Spring Boot 3.x:分布式链路追踪实战,完成三支柱闭环。
作者:hshloveyy | 博客:https://92yangyi.top | 如有问题欢迎加 QQ 群交流
评论区