PipelineDB快速上手

PipelineDB是PostgreSQL的一个扩展插件,提供流式数据处理的相关功能。

PipelineDB安装与配置

PipelineDB可以直接通过官方rpm包安装。

加载PipelineDB需要添加动态链接库,在postgresql.conf中修改配置项并重启:

shared_preload_libraries = 'pipelinedb' max_worker_processes = 128

注意如果不修改max_worker_processes会报错。其他配置都参照标准的PostgreSQL

PipelineDB使用样例 —— 维基PV数据

-- 创建Stream CREATE FOREIGN TABLE wiki_stream ( hour timestamp, project text, title text, view_count bigint, size bigint) SERVER pipelinedb; -- 在Stream上进行聚合 CREATE VIEW wiki_stats WITH (action=materialize) AS SELECT hour, project, count(*) AS total_pages, sum(view_count) AS total_views, min(view_count) AS min_views, max(view_count) AS max_views, avg(view_count) AS avg_views, percentile_cont(0.99) WITHIN GROUP (ORDER BY view_count) AS p99_views, sum(size) AS total_bytes_served FROM wiki_stream GROUP BY hour, project;

然后,向Stream中插入数据:

curl -sL http://pipelinedb.com/data/wiki-pagecounts | gunzip | \ psql -c " COPY wiki_stream (hour, project, title, view_count, size) FROM STDIN"

基本概念

PipelineDB中的基本抽象被称之为:连续视图(Continuous View)

最后修改 2025-02-16: init commit (35df8f3)