๐Ÿ“ฆ alex-berliner / facebook_mp_scraper

โ˜… 0 stars โ‘‚ 0 forks ๐Ÿ‘ 0 watching
๐Ÿ“ฅ Clone https://github.com/alex-berliner/facebook_mp_scraper.git
HTTPS git clone https://github.com/alex-berliner/facebook_mp_scraper.git
SSH git clone git@github.com:alex-berliner/facebook_mp_scraper.git
CLI gh repo clone alex-berliner/facebook_mp_scraper
Alex Berliner Alex Berliner Facebook scraper. Code made by AI, not me b780056 18 days ago ๐Ÿ“ History
๐Ÿ“‚ main View all commits โ†’
๐Ÿ“ data
๐Ÿ“ src
๐Ÿ“„ .gitignore
๐Ÿ“„ config.py
๐Ÿ“„ main.py
๐Ÿ“„ plan.md
๐Ÿ“„ README.md
๐Ÿ“„ requirements.txt
๐Ÿ“„ scraper_plan.md
๐Ÿ“„ setup.sh
๐Ÿ“„ README.md

Facebook Marketplace Scraper

A Python-based scraper using Playwright to extract listings from Facebook Marketplace with authentication, pagination, and caching.

Features

  • Authentication: Cookie-based authentication with fallback to username/password
  • Pagination: Automatically scrolls and loads more listings until target count is reached
  • Caching: Remembers previously scraped listings to avoid duplicates
  • Detailed Extraction: Extracts title, price, description, images, location, seller info, and more
  • Robust Parsing: Multiple extraction strategies to handle Facebook's minified HTML

Installation

  • Install Python dependencies:
pip install -r requirements.txt

  • Install Playwright browsers:
playwright install chromium

Configuration

Environment Variables (Optional)

You can set your Facebook credentials as environment variables:

# Windows PowerShell
$env:FB_EMAIL="your_email@example.com"
$env:FB_PASSWORD="your_password"

# Windows CMD
set FB_EMAIL=your_email@example.com
set FB_PASSWORD=your_password

# Linux/Mac
export FB_EMAIL="your_email@example.com"
export FB_PASSWORD="your_password"

Or create a .env file:

FB_EMAIL=your_email@example.com
FB_PASSWORD=your_password

Note: If credentials are not provided, the scraper will open a browser window for manual login.

Usage

Basic Usage

python main.py "search query"

Examples

# Search for "laptop"
python main.py laptop

# Search with custom max listings
python main.py "gaming chair" --max-listings 50

# Run in headless mode (browser not visible)
python main.py "bicycle" --headless

# Ignore cache and scrape all listings
python main.py "car" --no-cache

Command Line Arguments

  • query (required): Search query for Facebook Marketplace
  • --max-listings N: Maximum number of listings to scrape (default: 100)
  • --headless: Run browser in headless mode
  • --no-cache: Ignore cache and scrape all listings

How It Works

  • Authentication:
  • First attempts to use saved cookies from data/cookies.json
  • If cookies are invalid/expired, falls back to username/password login
  • Saves cookies after successful login for future use
  • Search: Navigates to Facebook Marketplace search page
  • Pagination:
  • Waits for initial results to load
  • Scrolls down to load more listings
  • Continues until target count is reached or no new content loads
  • Extraction:
  • Extracts basic info (title, price, URL) from search results
  • For each new listing, visits the detail page
  • Extracts comprehensive information including description, images, location, seller info
  • Caching:
  • Checks cache before scraping each listing
  • Skips already-scraped listings
  • Saves new listings to data/cache.json

Project Structure

marketplace_apt/
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ scraper.py          # Main scraper class
โ”‚   โ”œโ”€โ”€ auth.py             # Authentication handling
โ”‚   โ”œโ”€โ”€ cache.py            # Listing cache management
โ”‚   โ””โ”€โ”€ models.py           # Data models/classes
โ”œโ”€โ”€ data/
โ”‚   โ”œโ”€โ”€ cookies.json        # Stored session cookies
โ”‚   โ””โ”€โ”€ cache.json          # Cached listings
โ”œโ”€โ”€ requirements.txt
โ”œโ”€โ”€ config.py               # Configuration settings
โ”œโ”€โ”€ main.py                 # Entry point
โ””โ”€โ”€ README.md

Data Storage

  • Cookies: Stored in data/cookies.json (created automatically after login)
  • Cache: Stored in data/cache.json (created automatically when listings are scraped)

Notes

  • Facebook may detect automation and require manual intervention (2FA, captcha, etc.)
  • Rate limiting: The scraper includes delays to avoid being blocked, but be respectful
  • Facebook's HTML structure may change, requiring selector updates
  • This tool is for educational purposes - respect Facebook's Terms of Service

Troubleshooting

Authentication Issues

  • If cookies expire, delete data/cookies.json and login again
  • For 2FA, the browser will stay open for manual completion

No Results Found

  • Check your internet connection
  • Verify you're logged in correctly
  • Try a different search query

Browser Issues

  • Ensure Playwright browsers are installed: playwright install chromium
  • Try running without --headless to see what's happening

License

This project is for educational purposes only. Use responsibly and in accordance with Facebook's Terms of Service.