๐Ÿ“ฆ sglkc / data-warehouse-advanced

๐Ÿ“„ README.md ยท 80 lines
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80# Data Warehouse Monorepo

This project is a data warehouse mock project about a hardware company sales. The final output will
be a data warehouse with layered medallion architecture-like that will be presented in a human-friendly way.

## TODO

The complete architecture will be included inside `docs` directory. Stay tuned!

## Quick Intro

The project is bootstrapped from [Zero One Group's monorepo](https://github.com/zero-one-group/monorepo/)
which is used as a standardized boilerplate for company projects.

Most of the tasks will be done inside `data-pipeline` directory, where the ETL process of the data
warehouse happen.

## Prerequisites

- [pnpm](https://pnpm.io/installation) to run commands
- [Docker](https://docs.docker.com/engine/install/) for containerization
- [uv](https://docs.astral.sh/uv/getting-started/installation/) for Python
- [Moonrepo](https://moonrepo.dev/docs/getting-started/installation) for monorepo management

## Containerization

The project uses Docker for development, the services used are:

- PostgreSQL, as database for data warehouse
- Metabase, as front-end for data warehouse
- pgweb, for web-based database browser

To run these services inside Docker, run:

```sh
pnpm compose:up
```

To stop the services, run:

```sh
pnpm compose:down
```

To clean up everything, including container data, run:

```sh
pnpm compose:cleanup
```

Once everything is running, you may use each services below.

## Database

To log in to the PostgreSQL data warehouse inside Docker, use the following credentials:

- Host: localhost
- Port: 5432
- User: postgres
- Password: securedb
- Database: myorg

## Front-end

The project uses [Metabase](https://www.metabase.com/) for data presentation. Metabase is an
open-source web-based business inteligence and analytics tool to query and visualize data in a
straightforward manner.

Metabase uses its own database and separated from the data warehouse. This way, the front-end can be
synced by export and import.

After running `pnpm compose:up`, you must do a first setup for Metabase. Please read
[SETUP-METABASE.md](./docs/SETUP-METABASE.md) for a walkthrough.

## Data Pipeline

The data pipeline uses `dlt` and `sqlmesh` for the ETL process.

For technical details, please head to `data-pipeline` directory and open [README.md](./data-pipeline/README.md).