๐Ÿ“ฆ sglkc / data-warehouse-advanced

โ˜… 0 stars โ‘‚ 0 forks ๐Ÿ‘ 0 watching
๐Ÿ“ฅ Clone https://github.com/sglkc/data-warehouse-advanced.git
HTTPS git clone https://github.com/sglkc/data-warehouse-advanced.git
SSH git clone git@github.com:sglkc/data-warehouse-advanced.git
CLI gh repo clone sglkc/data-warehouse-advanced
Seya Seya dataset not good cb11db3 4 months ago ๐Ÿ“ History
๐Ÿ“‚ master View all commits โ†’
๐Ÿ“ .dbeaver
๐Ÿ“ .moon
๐Ÿ“ .vscode
๐Ÿ“ data-pipeline
๐Ÿ“ docker
๐Ÿ“ docs
๐Ÿ“ internal
๐Ÿ“„ .dockerignore
๐Ÿ“„ .editorconfig
๐Ÿ“„ .gitignore
๐Ÿ“„ .ncurc.json
๐Ÿ“„ .npmrc
๐Ÿ“„ .project
๐Ÿ“„ biome.json
๐Ÿ“„ package.json
๐Ÿ“„ pnpm-lock.yaml
๐Ÿ“„ README.md
๐Ÿ“„ tsconfig.json
๐Ÿ“„ README.md

Data Warehouse Monorepo

This project is a data warehouse mock project about a hardware company sales. The final output will be a data warehouse with layered medallion architecture-like that will be presented in a human-friendly way.

TODO

The complete architecture will be included inside docs directory. Stay tuned!

Quick Intro

The project is bootstrapped from Zero One Group's monorepo which is used as a standardized boilerplate for company projects.

Most of the tasks will be done inside data-pipeline directory, where the ETL process of the data warehouse happen.

Prerequisites

Containerization

The project uses Docker for development, the services used are:

  • PostgreSQL, as database for data warehouse
  • Metabase, as front-end for data warehouse
  • pgweb, for web-based database browser
To run these services inside Docker, run:

pnpm compose:up

To stop the services, run:

pnpm compose:down

To clean up everything, including container data, run:

pnpm compose:cleanup

Once everything is running, you may use each services below.

Database

To log in to the PostgreSQL data warehouse inside Docker, use the following credentials:

  • Host: localhost
  • Port: 5432
  • User: postgres
  • Password: securedb
  • Database: myorg

Front-end

The project uses Metabase for data presentation. Metabase is an open-source web-based business inteligence and analytics tool to query and visualize data in a straightforward manner.

Metabase uses its own database and separated from the data warehouse. This way, the front-end can be synced by export and import.

After running pnpm compose:up, you must do a first setup for Metabase. Please read SETUP-METABASE.md for a walkthrough.

Data Pipeline

The data pipeline uses dlt and sqlmesh for the ETL process.

For technical details, please head to data-pipeline directory and open README.md.