📄 README.md

README

A limited-scope web scraping project to dip my toes in developing and deploying with a Rails/MongoDB stack.

It scrapes autocomplete results of Play Store searches and uses caching (opt-out) to both improve performance and respect Google's bandwidth. Nevertheless, this project is for personal, educational purposes and should not be used for large-scale commercial scraping.

The backend API uses Rails to retrieve autocomplete results from an internal Play Store endpoint. All request and response data are logged in a MongoDB cluster for analysis and caching.

The frontend is a single page that allows using the API, with logs of the last 24 hours. No Cache check can be used to ensure fresh results.

I initially explored using Kamal for deployment but opted for a Docker Compose setup with Caddy, as my prior experience with this stack allowed for a faster and more straightforward deployment.

For the live demo, DNS is handled by Cloudflare, and the project is hosted on an AWS EC2 instance.

Setup

Clone the repository

Populate .env with the following environment variables:

MONGODB_URI=
DOMAIN_NAME=
EMAIL=  # For caddy to provide when registering the domain for SSL

Update the domain name in Caddyfile

Build the docker image

docker compose build

Run the docker containers

docker compose up -d

Check logs

docker compose logs -f

Usage

Frontend

Go to your domain from the browser and use the text input and the No Cache checkbox to interact with the API and observe the request logs of the last 24 hours.

API

Example request:

curl "https://api.yourdomain.com/api/v1/play_store_suggestions/search?query=api"

Without caching:

curl "https://api.yourdomain.com/api/v1/play_store_suggestions/search?query=api&no_cache=true"

Example response:

{
  "count":5,
  "suggestions": [
    "api",
    "api healthcare mobile workforce",
    "apify",
    "api tester",
    "apics"
  ],
  "error": null
}