日志分析案例 - 元素码农

发布时间: 2025-04-08 10:29

↑

# 日志分析案例

本文将通过一个实际的日志分析案例，详细介绍如何使用ClickHouse进行大规模日志数据的采集、存储和分析。我们将以Web服务器访问日志为例，展示完整的实现过程。

## 1. 日志数据模型设计

### 1.1 数据表结构

```sql
CREATE TABLE web_logs
(
    `timestamp` DateTime,
    `request_id` String,
    `remote_addr` String,
    `request_method` String,
    `request_uri` String,
    `status` UInt16,
    `bytes_sent` UInt64,
    `http_referer` String,
    `http_user_agent` String,
    `request_time` Float32,
    `upstream_response_time` Float32
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (timestamp, request_id)
SETTINGS index_granularity = 8192
```

### 1.2 索引优化

```sql
-- 为常用查询字段创建索引
ALTER TABLE web_logs ADD INDEX status_idx status TYPE minmax GRANULARITY 4;
ALTER TABLE web_logs ADD INDEX uri_idx request_uri TYPE tokenbf_v1(2, 3, 0) GRANULARITY 4;
```

## 2. 数据采集与导入

### 2.1 日志格式处理

使用Vector处理Nginx日志：

```toml
[sources.nginx_logs]
type = "file"
include = ["/var/log/nginx/access.log"]

[transforms.parse_nginx]
type = "remap"
source = "nginx_logs"

[sinks.clickhouse]
type = "clickhouse"
endpoint = "http://localhost:8123"
database = "logs"
table = "web_logs"
batch_size = 10000
```

### 2.2 批量导入历史数据

```sql
-- 使用本地文件导入
INSERT INTO web_logs
FROM INFILE 'logs.csv'
FORMAT CSV

-- 使用远程文件导入
INSERT INTO web_logs
FROM URL 'http://example.com/logs.csv'
FORMAT CSV
```

## 3. 常用分析查询

### 3.1 流量分析

```sql
-- 按小时统计PV
SELECT 
    toStartOfHour(timestamp) as hour,
    count() as pv
FROM web_logs
WHERE timestamp >= now() - INTERVAL 1 DAY
GROUP BY hour
ORDER BY hour;

-- 统计TOP IP
SELECT 
    remote_addr,
    count() as visits
FROM web_logs
WHERE timestamp >= now() - INTERVAL 1 DAY
GROUP BY remote_addr
ORDER BY visits DESC
LIMIT 10;
```

### 3.2 性能分析

```sql
-- 慢请求分析
SELECT 
    request_uri,
    count() as count,
    avg(request_time) as avg_time,
    max(request_time) as max_time
FROM web_logs
WHERE timestamp >= now() - INTERVAL 1 DAY
GROUP BY request_uri
HAVING avg_time > 1
ORDER BY avg_time DESC;

-- 错误率统计
SELECT 
    toStartOfHour(timestamp) as hour,
    countIf(status >= 500) * 100 / count() as error_rate
FROM web_logs
WHERE timestamp >= now() - INTERVAL 1 DAY
GROUP BY hour
ORDER BY hour;
```

## 4. 数据可视化

### 4.1 Grafana配置

1. 添加ClickHouse数据源：
```ini
Host: http://localhost:8123
Database: logs
Username: default
```

2. 创建仪表板：
```sql
-- 实时PV/UV监控
SELECT 
    toStartOfMinute(timestamp) as minute,
    count() as pv,
    count(DISTINCT remote_addr) as uv
FROM web_logs
WHERE timestamp >= $__timeFrom AND timestamp <= $__timeTo
GROUP BY minute
ORDER BY minute
```

### 4.2 告警设置

```sql
-- 错误率告警
SELECT 
    now() as time,
    countIf(status >= 500) * 100 / count() as error_rate
FROM web_logs
WHERE timestamp >= now() - INTERVAL 5 MINUTE
HAVING error_rate > 5
```

## 5. 性能优化

### 5.1 查询优化

```sql
-- 使用预聚合表
CREATE MATERIALIZED VIEW web_logs_hourly
ENGINE = SummingMergeTree()
PARTITION BY toYYYYMM(hour)
ORDER BY (hour, status)
AS SELECT
    toStartOfHour(timestamp) as hour,
    status,
    count() as count,
    sum(bytes_sent) as bytes_total,
    avg(request_time) as avg_request_time
FROM web_logs
GROUP BY hour, status;
```

### 5.2 存储优化

```sql
-- 配置TTL
ALTER TABLE web_logs
    MODIFY TTL timestamp + INTERVAL 3 MONTH DELETE,
    timestamp + INTERVAL 1 MONTH TO VOLUME 'cold';

-- 优化压缩设置
ALTER TABLE web_logs
    MODIFY SETTING min_bytes_for_wide_part = 10485760,
    min_rows_for_wide_part = 512000;
```

## 6. 运维管理

### 6.1 监控指标

```sql
-- 监控表大小
SELECT 
    table,
    formatReadableSize(sum(bytes_on_disk)) as disk_size,
    sum(rows) as total_rows,
    max(modification_time) as last_modified
FROM system.parts
WHERE active AND database = 'logs'
GROUP BY table;

-- 监控查询性能
SELECT 
    type,
    query,
    read_rows,
    read_bytes,
    total_rows_approx,
    query_duration_ms
FROM system.query_log
WHERE type = 'QueryFinish'
  AND query NOT LIKE '%system%'
ORDER BY query_duration_ms DESC
LIMIT 10;
```

### 6.2 备份策略

```bash
# 创建备份
clickhouse-backup create backup_name

# 上传到S3
clickhouse-backup upload backup_name

# 从S3恢复
clickhouse-backup download backup_name
clickhouse-backup restore backup_name
```

## 总结

通过本案例，我们展示了如何使用ClickHouse构建完整的日志分析系统。从数据建模、采集导入，到查询分析、可视化展示，再到性能优化和运维管理，涵盖了日志分析系统的各个环节。在实际应用中，可以根据具体需求调整和优化这些配置，以获得最佳的分析效果。