📦 google-gemini / gemini-cli

📄 architecture.md · 81 lines
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81# Gemini CLI Architecture Overview

This document provides a high-level overview of the Gemini CLI's architecture.

## Core components

The Gemini CLI is primarily composed of two main packages, along with a suite of
tools that can be used by the system in the course of handling command-line
input:

1.  **CLI package (`packages/cli`):**
    - **Purpose:** This contains the user-facing portion of the Gemini CLI, such
      as handling the initial user input, presenting the final output, and
      managing the overall user experience.
    - **Key functions contained in the package:**
      - [Input processing](/docs/cli/commands)
      - History management
      - Display rendering
      - [Theme and UI customization](/docs/cli/themes)
      - [CLI configuration settings](/docs/get-started/configuration)

2.  **Core package (`packages/core`):**
    - **Purpose:** This acts as the backend for the Gemini CLI. It receives
      requests sent from `packages/cli`, orchestrates interactions with the
      Gemini API, and manages the execution of available tools.
    - **Key functions contained in the package:**
      - API client for communicating with the Google Gemini API
      - Prompt construction and management
      - Tool registration and execution logic
      - State management for conversations or sessions
      - Server-side configuration

3.  **Tools (`packages/core/src/tools/`):**
    - **Purpose:** These are individual modules that extend the capabilities of
      the Gemini model, allowing it to interact with the local environment
      (e.g., file system, shell commands, web fetching).
    - **Interaction:** `packages/core` invokes these tools based on requests
      from the Gemini model.

## Interaction flow

A typical interaction with the Gemini CLI follows this flow:

1.  **User input:** The user types a prompt or command into the terminal, which
    is managed by `packages/cli`.
2.  **Request to core:** `packages/cli` sends the user's input to
    `packages/core`.
3.  **Request processed:** The core package:
    - Constructs an appropriate prompt for the Gemini API, possibly including
      conversation history and available tool definitions.
    - Sends the prompt to the Gemini API.
4.  **Gemini API response:** The Gemini API processes the prompt and returns a
    response. This response might be a direct answer or a request to use one of
    the available tools.
5.  **Tool execution (if applicable):**
    - When the Gemini API requests a tool, the core package prepares to execute
      it.
    - If the requested tool can modify the file system or execute shell
      commands, the user is first given details of the tool and its arguments,
      and the user must approve the execution.
    - Read-only operations, such as reading files, might not require explicit
      user confirmation to proceed.
    - Once confirmed, or if confirmation is not required, the core package
      executes the relevant action within the relevant tool, and the result is
      sent back to the Gemini API by the core package.
    - The Gemini API processes the tool result and generates a final response.
6.  **Response to CLI:** The core package sends the final response back to the
    CLI package.
7.  **Display to user:** The CLI package formats and displays the response to
    the user in the terminal.

## Key design principles

- **Modularity:** Separating the CLI (frontend) from the Core (backend) allows
  for independent development and potential future extensions (e.g., different
  frontends for the same backend).
- **Extensibility:** The tool system is designed to be extensible, allowing new
  capabilities to be added.
- **User experience:** The CLI focuses on providing a rich and interactive
  terminal experience.