📄 README.md

Facebook Marketplace Scraper

A Python-based scraper using Playwright to extract listings from Facebook Marketplace with authentication, pagination, and caching.

Features

Authentication: Cookie-based authentication with fallback to username/password
Pagination: Automatically scrolls and loads more listings until target count is reached
Caching: Remembers previously scraped listings to avoid duplicates
Detailed Extraction: Extracts title, price, description, images, location, seller info, and more
Robust Parsing: Multiple extraction strategies to handle Facebook's minified HTML

Installation

Install Python dependencies:

pip install -r requirements.txt

Install Playwright browsers:

playwright install chromium

Configuration

Environment Variables (Optional)

You can set your Facebook credentials as environment variables:

# Windows PowerShell
$env:FB_EMAIL="your_email@example.com"
$env:FB_PASSWORD="your_password"

# Windows CMD
set FB_EMAIL=your_email@example.com
set FB_PASSWORD=your_password

# Linux/Mac
export FB_EMAIL="your_email@example.com"
export FB_PASSWORD="your_password"

Or create a .env file:

FB_EMAIL=your_email@example.com
FB_PASSWORD=your_password

Note: If credentials are not provided, the scraper will open a browser window for manual login.

Usage

Basic Usage

python main.py "search query"

Examples

# Search for "laptop"
python main.py laptop

# Search with custom max listings
python main.py "gaming chair" --max-listings 50

# Run in headless mode (browser not visible)
python main.py "bicycle" --headless

# Ignore cache and scrape all listings
python main.py "car" --no-cache

Command Line Arguments

query (required): Search query for Facebook Marketplace
--max-listings N: Maximum number of listings to scrape (default: 100)
--headless: Run browser in headless mode
--no-cache: Ignore cache and scrape all listings

How It Works

Authentication:

First attempts to use saved cookies from data/cookies.json

If cookies are invalid/expired, falls back to username/password login

Saves cookies after successful login for future use

Search: Navigates to Facebook Marketplace search page

Pagination:

Waits for initial results to load

Scrolls down to load more listings

Continues until target count is reached or no new content loads

Extraction:

Extracts basic info (title, price, URL) from search results

For each new listing, visits the detail page

Extracts comprehensive information including description, images, location, seller info

Caching:

Checks cache before scraping each listing

Skips already-scraped listings

Saves new listings to data/cache.json

Project Structure

marketplace_apt/
├── src/
│   ├── __init__.py
│   ├── scraper.py          # Main scraper class
│   ├── auth.py             # Authentication handling
│   ├── cache.py            # Listing cache management
│   └── models.py           # Data models/classes
├── data/
│   ├── cookies.json        # Stored session cookies
│   └── cache.json          # Cached listings
├── requirements.txt
├── config.py               # Configuration settings
├── main.py                 # Entry point
└── README.md

Data Storage

Cookies: Stored in data/cookies.json (created automatically after login)
Cache: Stored in data/cache.json (created automatically when listings are scraped)

Notes

Facebook may detect automation and require manual intervention (2FA, captcha, etc.)
Rate limiting: The scraper includes delays to avoid being blocked, but be respectful
Facebook's HTML structure may change, requiring selector updates
This tool is for educational purposes - respect Facebook's Terms of Service

Troubleshooting

Authentication Issues

If cookies expire, delete data/cookies.json and login again
For 2FA, the browser will stay open for manual completion

No Results Found

Check your internet connection
Verify you're logged in correctly
Try a different search query

Browser Issues

Ensure Playwright browsers are installed: playwright install chromium
Try running without --headless to see what's happening

License

This project is for educational purposes only. Use responsibly and in accordance with Facebook's Terms of Service.