📦 google-gemini / workshops

📄 README.md · 48 lines
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48# Bricks

Fine-tuning Gemini for brick-building

Welcome to the "Bricks" project! This repository showcases a comprehensive, end-to-end pipeline for fine-tuning large language models to understand and generate highly structured data, specifically LDraw (MPD) instructions for brick models. Designed as a workshop example, it demonstrates practical applications of advanced LLM techniques for specialized domain tasks.

---

### What this project does

The "Bricks" project demonstrates an end-to-end pipeline for fine-tuning Google's Gemini 1.5 Pro model to understand natural language descriptions and generate precise LDraw (MPD) brick-building instructions. It tackles the challenge of automating the creation of detailed 3D brick model files from simple user queries, eliminating the need for tedious, manual MPD file creation.

### Key Features & Capabilities

*   **Automated Training Data Generation:** Leverages cutting-edge multimodal LLMs (like GPT-4o) to synthetically generate high-quality, structured text descriptions and diverse user queries directly from visual inputs (rendered brick models).
*   **Vertex AI Supervised Fine-Tuning:** Prepares and orchestrates the fine-tuning of Gemini 1.5 Pro on Google Cloud's Vertex AI platform using instruction-tuned datasets.
*   **LDraw MPD Generation:** Enables the fine-tuned Gemini model to translate natural language prompts into accurate, machine-readable LDraw MPD (Model Primitive Data) files.
*   **Inference Integration:** Demonstrates how to query the custom-tuned Gemini model for real-time MPD generation using LangChain.
*   **Comprehensive Project Documentation:** Includes detailed development notes and an interactive, code-driven presentation built with Slidev, making the complex process transparent and understandable.

### Quick Start / Usage

This project is designed as a detailed demonstration of an end-to-end LLM fine-tuning pipeline rather than a ready-to-run application. While full training cycles can be resource-intensive, you can explore the core components and logic:

1.  **Examine the Pipeline Scripts:**
    *   `bricks/bin/render.sh`: Renders LDraw files into images (a prerequisite for data generation).
    *   `bricks/bin/describe.py`: The heart of synthetic data generation – uses a multimodal LLM to create descriptions and queries from rendered images.
    *   `bricks/bin/examples.py`: Prepares the generated data into Vertex AI's instruction tuning (`.jsonl`) format.
    *   `bricks/bin/train.py`: Orchestrates the fine-tuning of Gemini 1.5 Pro on Vertex AI.
    *   `bricks/bin/query.py`: Demonstrates how to interact with a *fine-tuned* Gemini model endpoint for MPD generation.
2.  **Review the Generated Data:** Inspect the format of the synthetic descriptions and the instruction-tuned examples to understand the training data structure.
3.  **Explore the Presentation:** Dive into the accompanying Slidev presentation (see "Detailed Documentation" below) for a high-level overview and visual explanations.

### Technical Highlights

*   **Multi-stage AI Pipeline:** A robust, automated pipeline encompassing data rendering, synthetic data generation, data preparation, model fine-tuning, and inference.
*   **Multimodal Data Generation:** Ingenious use of `gpt-4o` to synthesize diverse training data (text descriptions + queries) directly from visual inputs (PNG renders of brick models).
*   **Structured Output with Pydantic:** Ensures reliable, parseable output from LLMs during data generation, crucial for downstream processing and data quality.
*   **Vertex AI Supervised Fine-Tuning:** Direct application of Vertex AI's SFT service with `gemini-1.5-pro-002`, showcasing best practices for custom LLM adaptation for specific tasks.
*   **Instruction Tuning Format:** Adherence to standard instruction tuning formats (`{"systemInstruction": ..., "contents": [...]}`) for efficient and effective model training on Vertex AI.
*   **Practical LLM Considerations:** Includes logic for managing token limits and exposing key hyperparameters (e.g., `EPOCHS`, `LEARNING_RATE_MULTIPLIER`) for fine-tuning optimization and experimentation.
*   **Versioned, Code-Centric Presentation:** Utilizes [Slidev](https://sli.dev/) (Markdown-based, Vue 3) for an interactive, maintainable, and deployable presentation, treating documentation as a first-class citizen.
*   **Clear Licensing:** Demonstrates how to properly license project files (Apache 2.0) for sharing and collaboration in a professional development environment.

### Links to Detailed Documentation

*   **Comprehensive Development Notes (`NOTES.md`):** For an in-depth look at the project's evolution, core problem statements, architectural decisions, and challenges faced during development, refer to the detailed [NOTES.md](NOTES.md) file.
*   **Accompanying Workshop Presentation (`slides.md`):** Explore the interactive presentation built with Slidev, which visually explains the project's architecture, methodology, and results. You can find the source in `bricks/slides/slides.md`.