๐Ÿ“ฆ polyzos / pulsar-flink-stateful-streams

โ˜… 17 stars โ‘‚ 13 forks ๐Ÿ‘ 17 watching
๐Ÿ“ฅ Clone https://github.com/polyzos/pulsar-flink-stateful-streams.git
HTTPS git clone https://github.com/polyzos/pulsar-flink-stateful-streams.git
SSH git clone git@github.com:polyzos/pulsar-flink-stateful-streams.git
CLI gh repo clone polyzos/pulsar-flink-stateful-streams
polyzos polyzos orders stream start from latest cursor 3520d86 3 years ago ๐Ÿ“ History
๐Ÿ“‚ main View all commits โ†’
๐Ÿ“ data
๐Ÿ“ images
๐Ÿ“ src
๐Ÿ“„ .gitignore
๐Ÿ“„ deploy.sh
๐Ÿ“„ pom.xml
๐Ÿ“„ Readme.md
๐Ÿ“„ setup.sh
๐Ÿ“„ README.md

WIP:

  • Optimize to backpressure, buffers, checkpoint intervals and wm intervals for larger state
  • User RocksDB API to demonstrate what gets written and how
  • Use time based joins for session windows and add time constraints

Use Case 1

Data Enrichment with Topic Lookups

Use Case 2

Data Aggregation with Time Constraints on Time Windows

Setup a Pulsar Cluster

docker run -rm -it --name pulsar \
-p 6650:6650  -p 8080:8080 \
--mount source=pulsardata,target=/pulsar/data \
--mount source=pulsarconf,target=/pulsar/conf \
apachepulsar/pulsar:2.9.1 \
bin/pulsar standalone

Setup Pulsar Logical Components

Go into your container
docker exec -it pulsar bash

and run the following commands

  • Create topics
bin/pulsar-admin topics create-partitioned-topic -p 1 persistent://public/default/orders
bin/pulsar-admin topics create-partitioned-topic -p 1 persistent://public/default/users
bin/pulsar-admin topics create-partitioned-topic -p 1 persistent://public/default/items

bin/pulsar-admin topics create-partitioned-topic -p 1 persistent://public/default/view_events
bin/pulsar-admin topics create-partitioned-topic -p 1 persistent://public/default/purchase_events
bin/pulsar-admin topics create-partitioned-topic -p 1 persistent://public/default/cart_events

bin/pulsar-admin topics list public/default

  • Set infinite Retention
bin/pulsar-admin topics set-retention -s -1 -t -1 persistent://public/default/users
bin/pulsar-admin topics set-retention -s -1 -t -1 persistent://public/default/items

bin/pulsar-admin topics get-retention persistent://public/default/users
bin/pulsar-admin topics get-retention persistent://public/default/items

bin/pulsar-admin topics set-retention -s -1 -t -1 persistent://public/default/view_events
bin/pulsar-admin topics set-retention -s -1 -t -1 persistent://public/default/purchase_events
bin/pulsar-admin topics set-retention -s -1 -t -1 persistent://public/default/cart_events

Start a Flink Cluster

start-cluster

Deploy the Flink Job

./deploy.sh

Monitor Flink logs

Tail the logs
tail -f log/flink-*-taskexecutor-*

The original Datasets can be found on the following links:

  • https://www.kaggle.com/datasets/alaasedeeq/dsc1069?select=dsv1069_events.csv
  • https://www.kaggle.com/datasets/mkechinov/ecommerce-events-history-in-cosmetics-shop