๐Ÿ“ฆ linares222 / SMA-simulator

โ˜… 0 stars โ‘‚ 0 forks ๐Ÿ‘ 0 watching
๐Ÿ“ฅ Clone https://github.com/linares222/SMA-simulator.git
HTTPS git clone https://github.com/linares222/SMA-simulator.git
SSH git clone git@github.com:linares222/SMA-simulator.git
CLI gh repo clone linares222/SMA-simulator
Loading files...
๐Ÿ“„ README.md

Multi-Agent Simulator

Implementation of a multi-agent system simulator with Q-Learning for navigation and foraging environments.

Features

  • Q-Learning: Complete reinforcement learning implementation
  • Multi-agent: Support for multiple simultaneous agents
  • Two environments: Farol (navigation) and Foraging (resource collection)
  • Interactive CLI: User-friendly interface for configuration and execution
  • Results analysis: Automatic generation of graphs and metrics
  • Mixed policies: Comparison between Q-Learning agents and fixed policies
  • Visualization: Graphical representation of environments and agents
  • Agent communication: Message passing system between agents (broadcast and direct messaging)
  • Advanced Analysis: Tools to inspect Q-tables and demonstrate policy limitations (Traps)

Requirements

  • Python 3.10+
  • NumPy >= 1.21.0
  • Matplotlib >= 3.5.0
  • Questionary >= 2.0.0

Installation

pip install -r requirements.txt

The run.sh script automatically installs dependencies if needed.

How to Run

Interactive CLI (Recommended)

The simulator includes an interactive interface that guides the user through all options:

./run.sh

The CLI allows you to configure:

  • Operation mode: Execute simulation or compare policies
  • Environment: FAROL or FORAGING
  • Mode: LEARNING (train) or TEST (evaluate trained policy)
  • Number of agents: Total number of agents in the simulation
  • Distribution: How many agents use Q-Learning vs fixed policy
  • Episodes: Number of episodes to run
  • Max steps: Maximum number of steps per episode
  • Graphs: Select which graphs to generate at the end
Policy Comparison Mode:
  • Compares Fixed Intelligent policy vs Q-Learning
  • Automatically detects available Q-tables for the selected environment
  • Limits Q-Learning agents to the number of available Q-tables
  • Executes both policies with the same configuration
  • Generates comparative graphs showing both policies side-by-side
  • Exports separate CSV files for each policy
Features:
  • Automatically activates Python virtual environment
  • Runs main simulation without visualization (faster)
  • Shows visualization only on the final episode
  • Automatically generates and opens analysis graphs
  • Saves results to CSV
  • Supports cancellation with Ctrl+C

Manual Mode (Legacy)

# farol environment (default)
python -m sma.run farol

# foraging environment
python -m sma.run foraging

# with visualization
python -m sma.run farol --visual

# specify number of episodes
python -m sma.run foraging -e 200

# save results
python -m sma.run farol -o results.csv

Project Structure

sma/
  core/              # Base classes (agent, environment, simulator)
    - agente_base.py      # Abstract agent class
    - ambiente_base.py    # Abstract environment class
    - simulador.py        # Simulation engine
    - politicas.py        # Q-Learning implementation
    - sensores.py         # Sensor system
    - visualizador.py     # Graphical visualization
    - resultados.py       # Metrics management
  agentes/           # Agent implementations
    - agente_farol.py     # Agent for Farol environment
    - agente_forager.py   # Agent for Foraging environment
  ambientes/         # Environment implementations
    - farol.py            # Farol navigation environment
    - foraging.py         # Foraging environment
  cli.py             # Interactive interface (CLI)
  comparar_politicas.py  # Policy comparison
  gerar_analise.py   # Analysis and graph generation
  loader.py          # Simulation loader
  main.py            # Main entry point
  run.py             # Simulation script
  config_*.json      # Configuration files
  resultados/        # Exported results (CSV)
  analise/           # Generated graphs (PNG)
  qtables/           # Saved Q-tables (JSON)
run.sh               # Script to run CLI
requirements.txt     # Python dependencies

Environments

Farol

Agents must navigate to the farol using Q-Learning. They receive the relative direction to the farol as observation through sensors. The goal is to reach the farol in the minimum number of steps.

Characteristics:

  • Observation: Relative direction to farol
  • Actions: Move in 4 directions (North, South, East, West)
  • Reward: Positive when reaching the farol, negative for steps without progress

Foraging

Agents collect resources and deposit them in the nest. More complex environment that involves collecting resources and depositing them in the nest.

Characteristics:

  • Observation: Agent state (with/without resource), relative position to nest and resources
  • Actions: Move, collect resources, deposit in nest
  • Reward: Based on the value of deposited resources

Configuration

Via Interactive CLI

The CLI automatically generates configuration based on user choices. It is not necessary to edit JSON files manually.

Via JSON Files (Manual Mode)

The config_*.json files define simulation parameters:

  • modo_execucao: LEARNING or TEST
  • episodios: Number of episodes
  • max_passos: Steps per episode
  • visualizar: true/false
  • Environment and agent parameters

Fine-Tuning Q-Learning Parameters

You can adjust Q-Learning hyperparameters in the JSON config files under each agent's politica section:

{
  "agentes": [
    {
      "politica": {
        "tipo": "qlearning",
        "alfa": 0.3,      // Learning rate (0.1-0.5): higher = faster learning
        "gama": 0.9,      // Discount factor (0.8-0.99): higher = more long-term planning
        "epsilon": 0.2    // Exploration rate (0.05-0.4): higher = more exploration
      }
    }
  ]
}

Parameter guidelines:

  • alfa (learning rate): 0.1-0.5. Higher values learn faster but may be unstable. Lower values are more stable but slower.
  • gama (discount factor): 0.8-0.99. Higher values (0.95) plan ahead better. Lower values (0.7-0.8) focus on immediate rewards.
  • epsilon (exploration): 0.05-0.4. Higher values explore more. Lower values exploit learned knowledge more.
The Q-Learning implementation is in sma/core/politicas.py with inline comments explaining each parameter.

Results Analysis

The simulator automatically generates:

  • Learning curves: Reward evolution over episodes
  • Performance metrics: Success rate, average steps, rewards
  • Policy comparison: Q-Learning vs fixed policies with side-by-side graphs
  • CSV export: Raw data for external analysis

Policy Comparison

Compare Fixed Intelligent policy with Q-Learning:

# Via CLI (interactive)
./run.sh
# Select "Comparar politicas (Fixa Inteligente vs Q-Learning)"

# Via command line
python -m sma.comparar_politicas config_farol.json --episodios 10

Q-table Management:

  • Q-tables are always saved to sma/qtables/ after training (overwrites existing ones)
  • The comparison script automatically detects how many Q-tables exist for the environment
  • Limits Q-Learning agents to the number of available Q-tables
  • Remaining agents use Fixed Intelligent policy
This generates:
  • Comparative statistics in terminal
  • Two CSV files (one for each policy)
  • Comparative graphs showing 6 metrics side-by-side

Additional Documentation

  • relatorio.md: Complete technical report on architecture and implementation

Development

Modular Structure

The project follows a modular architecture:

  • Core: Reusable base components
  • Agents: Environment-specific implementations
  • Environments: Simulation space definitions
  • Policies: Learning algorithms (Q-Learning)

Extensibility

To add new environments or agents:

  • Create a class that inherits from Ambiente or Agente
  • Implement required methods
  • Add corresponding JSON configuration

Agent Communication

The simulator includes a communication system that allows agents to exchange messages:

  • Direct messaging: simulador.enviar_mensagem() sends a message to a specific agent
  • Broadcast: simulador.broadcast_mensagem() sends a message to all agents
  • Agents can implement processar_comunicacao() to send messages based on events or proximity
  • Messages are stored in each agent's message queue and can be accessed via obter_mensagens()
See relatorio.md for detailed documentation on the communication system.

License

This is a project developed for educational purposes.