元素码农
基础
UML建模
数据结构
算法
设计模式
网络
TCP/IP协议
HTTPS安全机制
WebSocket实时通信
数据库
sqlite
postgresql
clickhouse
后端
rust
go
java
php
mysql
redis
mongodb
etcd
nats
zincsearch
前端
浏览器
javascript
typescript
vue3
react
游戏
unity
unreal
C++
C#
Lua
App
android
ios
flutter
react-native
安全
Web安全
测试
软件测试
自动化测试 - Playwright
人工智能
Python
langChain
langGraph
运维
linux
docker
工具
git
svn
🌞
🌙
目录
▶
概述篇
ClickHouse简介
适用场景解析
架构设计解析
▶
环境搭建
Linux安装指南
配置入门
客户端工具使用
▶
基础操作
数据库管理
数据类型详解
表结构设计
数据插入方法
基础查询语法
数据修改与删除
常用函数示例
查询优化基础
高级查询
▶
核心功能
表引擎详解
MergeTree引擎解析
索引机制详解
数据分区策略
TTL数据管理
▶
数据管理
数据导入导出
备份与恢复
用户权限管理
▶
应用实践
日志分析案例
用户行为分析
可视化工具使用
发布时间:
2025-04-08 10:30
↑
☰
# 用户行为分析 本文将详细介绍如何使用ClickHouse进行用户行为数据分析,包括数据建模、采集、分析和可视化等完整流程。通过实际案例,展示ClickHouse在用户行为分析领域的应用。 ## 1. 数据建模 ### 1.1 事件表设计 ```sql CREATE TABLE user_events ( `event_time` DateTime, `user_id` UInt32, `session_id` String, `event_type` String, `page_url` String, `referrer` String, `device_type` String, `os` String, `browser` String, `country` String, `city` String, `properties` String CODEC(ZSTD(3)) ) ENGINE = MergeTree() PARTITION BY toYYYYMM(event_time) ORDER BY (user_id, event_time) SETTINGS index_granularity = 8192 ``` ### 1.2 用户属性表 ```sql CREATE TABLE user_profiles ( `user_id` UInt32, `register_time` DateTime, `gender` String, `age` UInt8, `user_type` String, `last_active` DateTime, `is_subscribed` UInt8 ) ENGINE = ReplacingMergeTree(last_active) ORDER BY user_id ``` ## 2. 数据采集 ### 2.1 前端埋点 ```javascript // 埋点SDK示例 class TrackSDK { constructor() { this.baseData = { session_id: this.generateSessionId(), device_type: this.getDeviceType(), os: this.getOS(), browser: this.getBrowser() }; } track(eventType, properties = {}) { const eventData = { ...this.baseData, event_time: new Date().toISOString(), event_type: eventType, page_url: window.location.href, referrer: document.referrer, properties: JSON.stringify(properties) }; // 发送数据到收集服务器 fetch('https://collect.example.com/track', { method: 'POST', body: JSON.stringify(eventData) }); } } ``` ### 2.2 数据接入 使用Kafka作为数据缓冲: ```sql -- 创建Kafka引擎表 CREATE TABLE user_events_queue ( `event_time` DateTime, `user_id` UInt32, `session_id` String, `event_type` String, `page_url` String, `referrer` String, `device_type` String, `os` String, `browser` String, `country` String, `city` String, `properties` String ) ENGINE = Kafka SETTINGS kafka_broker_list = 'kafka1:9092,kafka2:9092', kafka_topic_list = 'user_events', kafka_group_name = 'clickhouse_consumer', kafka_format = 'JSONEachRow'; -- 创建物化视图实时导入数据 CREATE MATERIALIZED VIEW user_events_mv TO user_events AS SELECT * FROM user_events_queue; ``` ## 3. 行为分析 ### 3.1 用户路径分析 ```sql -- 用户访问路径分析 WITH sequence AS ( SELECT user_id, session_id, event_type, event_time, page_url, row_number() OVER (PARTITION BY session_id ORDER BY event_time) as step FROM user_events WHERE event_time >= now() - INTERVAL 1 DAY ) SELECT t1.page_url as from_page, t2.page_url as to_page, count() as transitions FROM sequence t1 JOIN sequence t2 ON t1.session_id = t2.session_id AND t1.step + 1 = t2.step GROUP BY from_page, to_page ORDER BY transitions DESC LIMIT 20; ``` ### 3.2 漏斗分析 ```sql -- 注册到购买转化漏斗 WITH funnel AS ( SELECT user_id, session_id, event_type, event_time FROM user_events WHERE event_time >= now() - INTERVAL 7 DAY AND event_type IN ('view_product', 'add_to_cart', 'start_checkout', 'purchase') ) SELECT event_type as step, count(DISTINCT user_id) as users, round(count(DISTINCT user_id) * 100.0 / lag(count(DISTINCT user_id)) OVER (ORDER BY event_type), 2) as conversion_rate FROM funnel GROUP BY event_type ORDER BY event_type; ``` ### 3.3 用户分群分析 ```sql -- RFM分析 WITH recency AS ( SELECT user_id, dateDiff('day', max(event_time), now()) as days_since_last_visit FROM user_events WHERE event_type = 'purchase' GROUP BY user_id ), frequency AS ( SELECT user_id, count() as purchase_count FROM user_events WHERE event_type = 'purchase' GROUP BY user_id ) SELECT multiIf( r.days_since_last_visit <= 30, 'High', r.days_since_last_visit <= 90, 'Medium', 'Low' ) as recency_segment, multiIf( f.purchase_count >= 10, 'High', f.purchase_count >= 5, 'Medium', 'Low' ) as frequency_segment, count() as user_count FROM recency r JOIN frequency f ON r.user_id = f.user_id GROUP BY recency_segment, frequency_segment ORDER BY recency_segment, frequency_segment; ``` ## 4. 实时分析 ### 4.1 实时监控 ```sql -- 实时活跃用户 SELECT toStartOfMinute(event_time) as minute, count(DISTINCT user_id) as active_users, count(DISTINCT session_id) as active_sessions FROM user_events WHERE event_time >= now() - INTERVAL 15 MINUTE GROUP BY minute ORDER BY minute; -- 实时转化率 SELECT toStartOfHour(event_time) as hour, countIf(event_type = 'purchase') * 100.0 / countIf(event_type = 'view_product') as conversion_rate FROM user_events WHERE event_time >= now() - INTERVAL 1 DAY GROUP BY hour ORDER BY hour; ``` ### 4.2 异常检测 ```sql -- 行为异常检测 WITH user_stats AS ( SELECT user_id, count() as event_count, uniq(session_id) as session_count, count(DISTINCT event_type) as event_type_count FROM user_events WHERE event_time >= now() - INTERVAL 1 HOUR GROUP BY user_id ) SELECT * FROM user_stats WHERE event_count > avgIf(event_count) OVER () + 3 * stddevPopStable(event_count) OVER () OR session_count > avgIf(session_count) OVER () + 3 * stddevPopStable(session_count) OVER (); ``` ## 5. 数据可视化 ### 5.1 Superset配置 1. 添加ClickHouse数据源: ```ini SQLAlchemy URI: clickhouse+native://default:@localhost:9000/default ``` 2. 创建仪表板: ```sql -- 用户增长趋势 SELECT toStartOfDay(register_time) as day, count() as new_users, sum(count()) OVER (ORDER BY toStartOfDay(register_time)) as total_users FROM user_profiles GROUP BY day ORDER BY day; ``` ### 5.2 报表自动化 ```python # 使用Python生成报表 from clickhouse_driver import Client import pandas as pd import matplotlib.pyplot as plt def generate_weekly_report(): client = Client('localhost') # 获取周活跃用户 query = ''' SELECT toStartOfDay(event_time) as day, count(DISTINCT user_id) as dau FROM user_events WHERE event_time >= now() - INTERVAL 7 DAY GROUP BY day ORDER BY day ''' df = pd.DataFrame(client.execute(query)) # 生成图表 plt.figure(figsize=(10, 6)) plt.plot(df['day'], df['dau']) plt.title('Daily Active Users') plt.savefig('weekly_report.png') ``` ## 6. 性能优化 ### 6.1 预聚合优化 ```sql -- 创建小时级别汇总表 CREATE MATERIALIZED VIEW user_events_hourly ENGINE = SummingMergeTree() PARTITION BY toYYYYMM(hour) ORDER BY (hour, event_type) AS SELECT toStartOfHour(event_time) as hour, event_type, count() as event_count, uniq(user_id) as user_count, uniq(session_id) as session_count FROM user_events GROUP BY hour, event_type; ``` ### 6.2 查询优化 ```sql -- 为常用查询创建物化视图 CREATE MATERIALIZED VIEW user_retention ENGINE = AggregatingMergeTree() PARTITION BY toYYYYMM(first_visit) ORDER BY (first_visit, days_between) AS SELECT toDate(min(event_time)) as first_visit, dateDiff('day', toDate(min(event_time)), toDate(event_time)) as days_between, uniqState(user_id) as user_count FROM user_events GROUP BY user_id; ``` ## 总结 本文介绍了如何使用ClickHouse构建完整的用户行为分析系统。从数据建模开始,通过合理的表结构设计和数据采集方案,实现了用户行为的实时采集。在分析层面,提供了路径分析、漏斗分析、分群分析等多种分析方法,并通过预聚合和物化视图等技术手段确保了查询性能。结合可视化工具,可以构建直观的数据仪表板,帮助业务人员更好地理解用户行为特征。