Docker Compose非常适合开发环境,但很多团队在将其用于生产环境时遇到了各种问题——安全配置不当、性能瓶颈、缺乏监控、无法优雅重启。这篇文章将系统性地讲解如何把Docker Compose从开发工具升级为可靠的生产部署方案。
在讨论”要不要用”之前,先明确适用场景:
如果你的项目符合以上条件,Docker Compose + 合理的运维脚本完全可以支撑生产环境。
先看一个典型的生产环境docker-compose.yml结构:
version: "3.9"
services:
# 反向代理
nginx:
image: nginx:1.27-alpine
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
- ./nginx/ssl:/etc/nginx/ssl:ro
- nginx_logs:/var/log/nginx
depends_on:
app:
condition: service_healthy
restart: unless-stopped
deploy:
resources:
limits:
cpus: "0.5"
memory: 256M
networks:
- frontend
- backend
# 应用服务
app:
build:
context: .
dockerfile: Dockerfile
environment:
- DATABASE_URL=postgresql://user:pass@db:5432/myapp
- REDIS_URL=redis://redis:6379/0
- SECRET_KEY=${SECRET_KEY}
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
restart: unless-stopped
deploy:
resources:
limits:
cpus: "2"
memory: 1G
networks:
- backend
secrets:
- db_password
# 数据库
db:
image: postgres:16-alpine
environment:
POSTGRES_USER: ${DB_USER}
POSTGRES_PASSWORD_FILE: /run/secrets/db_password
POSTGRES_DB: myapp
volumes:
- postgres_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${DB_USER}"]
interval: 10s
timeout: 5s
retries: 5
restart: unless-stopped
deploy:
resources:
limits:
cpus: "2"
memory: 2G
networks:
- backend
# Redis缓存
redis:
image: redis:7-alpine
command: redis-server --requirepass ${REDIS_PASSWORD} --maxmemory 256mb --maxmemory-policy allkeys-lru
volumes:
- redis_data:/data
healthcheck:
test: ["CMD", "redis-cli", "-a", "${REDIS_PASSWORD}", "ping"]
interval: 10s
timeout: 5s
retries: 5
restart: unless-stopped
deploy:
resources:
limits:
cpus: "0.5"
memory: 512M
networks:
- backend
volumes:
postgres_data:
driver: local
redis_data:
driver: local
nginx_logs:
driver: local
networks:
frontend:
driver: bridge
backend:
driver: bridge
internal: true # 内部网络,不能访问外网
secrets:
db_password:
file: ./secrets/db_password.txt
永远不要把密码、密钥写死在docker-compose.yml中:
# 使用.env文件(不提交到git)
# .env
SECRET_KEY=your-super-secret-key
DB_USER=appuser
REDIS_PASSWORD=redis-secret-123
# 使用Docker secrets(更安全)
# secrets/db_password.txt
your-db-password-here
在.gitignore中添加:
.env
.env.*
secrets/
上面的配置中,backend网络设置了internal: true,这意味着数据库和Redis不能直接访问外网,只有通过app服务才能被访问。这是最小权限原则的体现。
# 在每个服务中添加安全配置
app:
read_only: true # 只读文件系统(需要配合tmpfs)
tmpfs:
- /tmp
security_opt:
- no-new-privileges:true # 禁止提权
cap_drop:
- ALL # 丢弃所有Linux能力
cap_add:
- NET_BIND_SERVICE # 只添加必要的
健康检查是生产环境的关键。没有健康检查,Docker无法判断服务是否真正可用。
# 数据库健康检查
db:
healthcheck:
test: ["CMD-SHELL", "pg_isready -U $${POSTGRES_USER} -d $${POSTGRES_DB}"]
interval: 10s
timeout: 5s
retries: 5
start_period: 30s # 给数据库启动时间
# 应用健康检查
app:
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
注意start_period的设置——数据库启动通常需要较长时间,设置过短会导致误判。
默认情况下Docker日志会无限增长,最终撑满磁盘。
# 全局日志配置
services:
app:
logging:
driver: "json-file"
options:
max-size: "50m"
max-file: "5"
compress: "gzip"
db:
logging:
driver: "json-file"
options:
max-size: "100m"
max-file: "3"
或者使用syslog/fluentd集中收集:
services:
app:
logging:
driver: "fluentd"
options:
fluentd-address: "localhost:24224"
tag: "app.production"
生产环境的关键要求是不中断服务。以下是零停机部署的脚本:
#!/bin/bash
# deploy.sh - 零停机部署脚本
set -euo pipefail
COMPOSE_FILE="docker-compose.yml"
SERVICE_NAME="app"
echo "拉取最新代码..."
git pull origin main
echo "构建新镜像..."
docker compose -f $COMPOSE_FILE build $SERVICE_NAME
echo "启动新容器(滚动更新)..."
docker compose -f $COMPOSE_FILE up -d --no-deps --build $SERVICE_NAME
echo "等待健康检查通过..."
timeout=60
while [ $timeout -gt 0 ]; do
if docker compose -f $COMPOSE_FILE ps $SERVICE_NAME | grep -q "healthy"; then
echo "新容器健康检查通过"
break
fi
echo "等待健康检查... 剩余 ${timeout}s"
sleep 5
timeout=$((timeout - 5))
done
if [ $timeout -eq 0 ]; then
echo "健康检查超时,回滚..."
docker compose -f $COMPOSE_FILE rollback $SERVICE_NAME
exit 1
fi
echo "清理旧镜像..."
docker image prune -f
echo "部署完成!"
数据备份是生产环境的生命线:
#!/bin/bash
# backup.sh
BACKUP_DIR="/backups/$(date +%Y-%m-%d)"
mkdir -p $BACKUP_DIR
# PostgreSQL备份
docker compose exec -T db pg_dump -U $DB_USER $DB_NAME | gzip > $BACKUP_DIR/db_backup.sql.gz
# Redis备份
docker compose exec -T redis redis-cli -a $REDIS_PASSWORD --rdb - > $BACKUP_DIR/redis_backup.rdb
# 上传到远程存储
aws s3 sync $BACKUP_DIR s3://myapp-backups/$(date +%Y-%m-%d)/
# 清理30天前的备份
find /backups -type d -mtime +30 -exec rm -rf {} +
echo "备份完成: $BACKUP_DIR"
没有监控的生产环境就是在裸奔。推荐使用Prometheus + Grafana:
services:
prometheus:
image: prom/prometheus:v2.54
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.retention.time=30d'
restart: unless-stopped
networks:
- monitoring
grafana:
image: grafana/grafana:11
environment:
- GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}
volumes:
- grafana_data:/var/lib/grafana
restart: unless-stopped
networks:
- frontend
- monitoring
# 多阶段构建示例
FROM python:3.12-alpine AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
FROM python:3.12-alpine
WORKDIR /app
COPY --from=builder /usr/local/lib/python3.12/site-packages /usr/local/lib/python3.12/site-packages
COPY . .
CMD ["gunicorn", "-w", "4", "-b", "0.0.0.0:8000", "app:app"]
Docker Compose完全可以胜任生产环境部署,前提是你需要:
这些工作做完后,你会发现Docker Compose的生产部署方案比很多复杂的编排系统更容易维护和调试。关键是——不要把开发环境的配置直接搬到生产,花时间做好上面这些基础工作。