📦 google-gemini / workshops

📄 NOTES.md · 70 lines
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70# ../Bricks Development Notes

Generated from git commit history on 2025-08-02

## Development Timeline

### Commit 1: Slide title (#40) (13d63d9)

## NOTES.md: Development Story for Commit 13d63d9 - Initial Brick AI Pipeline Setup

This foundational commit marks a significant step in the "Bricks" project, laying down the core infrastructure for fine-tuning Gemini to understand and generate brick-building instructions (MPD files) from natural language. The primary objective was to establish a robust, automated pipeline for data generation, model training, and inference.

**Problem Solved:**
The core problem we're tackling is bridging the gap between natural language descriptions of brick models and their precise, machine-readable LDraw (MPD) format. Manually creating MPD files is tedious and requires specialized knowledge. This commit aims to automate the process by enabling an LLM to generate MPD directly from user queries. This requires two main steps:
1.  **Creating a High-Quality Training Dataset:** We needed a way to generate descriptive text and relevant user queries for existing 3D brick models (represented as PNG renders and MPD files), as manual annotation would be extremely time-consuming.
2.  **Fine-tuning a Generative Model:** Once we have paired natural language queries/descriptions with MPD content, we need to fine-tune an LLM to learn this mapping and generate new MPD files from novel queries.

**Technical Approach & Key Architectural Decisions:**

We've designed a multi-stage pipeline, encapsulated in distinct Python scripts, that orchestrates the entire process:

1.  **`bricks/bin/render.sh`**: This shell script is the first step, taking raw LDraw files and rendering them into PNG images. This is crucial for the subsequent multimodal analysis.
2.  **`bricks/bin/describe.py` (Data Generation Layer)**: This script is a clever hack for synthetic data generation. It leverages `langchain-openai` (specifically, `gpt-4o`) with multimodal capabilities.
    *   **Multimodal Input**: It reads the rendered PNG images (`image_to_base64`) and uses them as visual input to the `gpt-4o` model.
    *   **Structured Output with Pydantic**: A key architectural decision here is the use of `PydanticOutputParser` and a `Datum` class. This ensures that the LLM's output (descriptions and user queries) adheres to a strict, parseable schema, making it reliable for downstream processing. We included a `dune_buggy_datum` as a few-shot example in the prompt to guide the LLM towards the desired output format and quality.
    *   **Purpose**: This script generates natural language descriptions of the models and a diverse set of synthetic user queries based on the model's visual characteristics. This addresses the challenge of manually creating a large, varied dataset.
3.  **`bricks/bin/examples.py` (Training Data Preparation)**: This script acts as a linker. It reads the structured descriptions and queries generated by `describe.py` and pairs them with the raw MPD content.
    *   **Instruction Tuning Format**: The output (`examples.jsonl`) is formatted specifically for instruction tuning (`{"systemInstruction": ..., "contents": [...]}`). This prepares the data for Vertex AI's Supervised Fine-Tuning (SFT) service.
    *   **Token Limit Management**: Includes logic to `count_tokens` and skip examples that exceed the `TOKEN_LIMIT` (32000 in this case), a practical consideration for LLM fine-tuning to avoid truncation or errors.
4.  **`bricks/bin/train.py` (Fine-tuning Orchestration)**: This script manages the actual fine-tuning process on Google Cloud Vertex AI.
    *   **Supervised Fine-Tuning (SFT)**: It uses `vertexai.tuning.sft.train` to kick off a fine-tuning job using `gemini-1.5-pro-002` as the `source_model`. This is the core LLM we aim to adapt.
    *   **Hyperparameters**: Key training parameters like `EPOCHS`, `ADAPTER_SIZE`, and `LEARNING_RATE_MULTIPLIER` are exposed as flags, allowing for experimentation and optimization.
    *   **Cloud Storage Integration**: It handles uploading the `examples.jsonl` dataset to a GCS bucket, a necessary step for Vertex AI training jobs.
    *   **Job Monitoring**: Includes functionality to `list` and `poll` tuning jobs, providing visibility into the training process and crucial statistics.
5.  **`bricks/bin/query.py` (Inference/Deployment Mockup)**: This simple script demonstrates how to interact with the *tuned* Gemini model using `langchain-google-vertexai`.
    *   **Tuned Model Endpoint**: A notable implementation detail is the specific format for `tuned_model_name` (e.g., `projects/{PROJECT_ID}/locations/{LOCATION}/models/{TUNED_MODEL}`), reflecting insights from external LangChain Vertex AI examples (`TODO.md`). This highlights a potential challenge or specific integration detail that needed to be figured out.

In essence, this commit sets up the entire development workflow: from generating high-quality, structured training data using a powerful external LLM (`gpt-4o`) to preparing that data, fine-tuning our target LLM (`gemini-1.5-pro`), and finally, providing a way to query the newly specialized model. It's a comprehensive initial push towards creating an AI-powered brick-building assistant.

### Commit 2: Slides for bricks (#50) (a14f5aa)

## NOTES.md Entry for Commit `a14f5aa`

This commit marks a significant step in our "Bricks" project: the introduction of a dedicated presentation for showcasing the fine-tuning of Gemini for brick-building. The primary problem being solved here is the need to effectively communicate the project's technical intricacies, architectural choices, and the challenges faced, to a broader audience (likely for a workshop or internal review, given the context). Instead of relying on traditional, less version-control-friendly presentation software, we've opted for a modern, code-centric approach.

Our technical approach leverages [Slidev](https://sli.dev/), a Vue 3-powered slideshows system. This was a key architectural decision, as Slidev allows us to author presentations directly in Markdown, embed live code snippets, utilize Vue components for interactive elements (as seen with `components/Counter.vue`), and benefit from a robust frontend development ecosystem (Vite, UnoCSS, etc.). This makes the presentation itself a versioned artifact of our development, enabling easier collaboration, review, and maintenance. The inclusion of `netlify.toml` and `vercel.json` also indicates a forward-looking decision to simplify static deployment, making the presentation highly accessible and shareable. Furthermore, the `.npmrc` file's configuration for `pnpm` reflects our preferred package management strategy for this new frontend tooling.

The content of the slides (`slides.md` and `pages/imported-slides.md`) reveals the specific development story we aim to tell. It covers fundamental aspects of the "Bricks" project: describing brick sets (likely LDraw format), rendering visuals from these descriptions, the methodology for training our Gemini model (with examples of JSON-formatted training data and queries), and a transparent discussion of the significant financial costs associated with such LLM fine-tuning. The numerous image assets detailing "temperature" variations (e.g., `portrait-temperature-2.png`, `mind-temperature-0.5.png`) and various generative outputs (`chatgpt.png`, `trump.png`, `cat.png`, `universe.png`) are crucial implementation details. They signify an intent to visually demonstrate the impact of model parameters and the breadth of generative capabilities, addressing the implicit challenge of explaining abstract LLM concepts in a tangible way. This setup allows for a highly interactive and informative presentation, directly aligning with the project's goals of fine-tuning Gemini.

### Commit 3: Add license (d0ddacf)

### NOTES.md Entry for d0ddacfe128865aa524b287bf9b6a9e49286e01b

**Date:** Sat Jul 26 13:06:15 2025 -0700
**Title:** Add license

Alright team, let's unpack commit `d0ddacf`. While it might seem like a straightforward, low-impact change at first glance – simply adding license headers – this commit signifies a crucial step in the "Bricks" project's maturity and its preparation for broader distribution or internal sharing.

**The Problem Being Solved (and Why it Matters):**
Fundamentally, this commit addresses the critical need for explicit intellectual property governance and clear usage terms. In the absence of a defined license, the legal status of our "Bricks" project (fine-tuning Gemini for brick-building, no less!) would be ambiguous. This ambiguity could hinder internal collaboration, prevent sharing across different Google teams, or complicate any eventual open-source release. By embedding the Apache 2.0 license, we're formally establishing the terms under which this software can be used, modified, and distributed. It's about ensuring legal compliance, mitigating risks, and setting clear expectations for anyone interacting with the codebase. This is a foundational, non-functional requirement that speaks to the project's readiness and professionalism.

**Technical Approach & Implementation Details:**
The technical approach here was to apply the standard Apache 2.0 license boilerplate at the top of relevant source files. We can see this in action for `bricks/slides/components/Counter.vue` and `bricks/slides/snippets/external.ts`. A key implementation detail, often overlooked but important for tooling and readability, is the adaptation of the comment syntax for each file type:
*   For the Vue component, which features an HTML-like template section, the license is enclosed in standard HTML comments (`<!-- ... -->`).
*   For the TypeScript snippet, a multi-line JSDoc-style block comment (`/** ... */`) is used, which is idiomatic for JavaScript/TypeScript and allows for easy parsing by documentation generators or license scanning tools.
The copyright year is set to `2025`, indicating either the project's formal inception year or a forward-looking stamp for copyright protection.

**Key Architectural Decisions & Future Considerations:**
While the choice of the Apache 2.0 license itself is typically a project-level governance decision (often part of a broader strategy for Google projects), this commit is the operationalization of that decision at the file level. Apache 2.0 is a permissive license that allows for broad use while ensuring copyright attribution, aligning well with many internal and external project requirements. Though this commit specifically targets only a couple of files, it strongly implies a broader mandate to apply this license consistently across the entire "Bricks" codebase. Moving forward, we should consider incorporating automated checks (e.g., pre-commit hooks or CI/CD steps) to ensure new files consistently include the correct license header, thereby maintaining compliance and reducing manual oversight as the project evolves.