๐Ÿ“ฆ furkan / searchapi

web scraping project for upskilling in Rails / MongoDB deployments

โ˜… 0 stars โ‘‚ 0 forks ๐Ÿ‘ 0 watching
๐Ÿ“ฅ Clone https://github.com/furkan/searchapi.git
HTTPS git clone https://github.com/furkan/searchapi.git
SSH git clone git@github.com:furkan/searchapi.git
CLI gh repo clone furkan/searchapi
Furkan Aldemir Furkan Aldemir add: cloudflare & aws 9725089 2 months ago ๐Ÿ“ History
๐Ÿ“‚ main View all commits โ†’
๐Ÿ“ .github
๐Ÿ“ app
๐Ÿ“ bin
๐Ÿ“ config
๐Ÿ“ lib
๐Ÿ“ log
๐Ÿ“ public
๐Ÿ“ test
๐Ÿ“ tmp
๐Ÿ“ vendor
๐Ÿ“„ .dockerignore
๐Ÿ“„ .gitattributes
๐Ÿ“„ .gitignore
๐Ÿ“„ .rubocop.yml
๐Ÿ“„ .ruby-version
๐Ÿ“„ Caddyfile
๐Ÿ“„ config.ru
๐Ÿ“„ docker-compose.yml
๐Ÿ“„ Dockerfile
๐Ÿ“„ Gemfile
๐Ÿ“„ Gemfile.lock
๐Ÿ“„ Rakefile
๐Ÿ“„ README.md
๐Ÿ“„ README.md

README

A limited-scope web scraping project to dip my toes in developing and deploying with a Rails/MongoDB stack.

It scrapes autocomplete results of Play Store searches and uses caching (opt-out) to both improve performance and respect Google's bandwidth. Nevertheless, this project is for personal, educational purposes and should not be used for large-scale commercial scraping.

The backend API uses Rails to retrieve autocomplete results from an internal Play Store endpoint. All request and response data are logged in a MongoDB cluster for analysis and caching.

The frontend is a single page that allows using the API, with logs of the last 24 hours. No Cache check can be used to ensure fresh results.

I initially explored using Kamal for deployment but opted for a Docker Compose setup with Caddy, as my prior experience with this stack allowed for a faster and more straightforward deployment.

For the live demo, DNS is handled by Cloudflare, and the project is hosted on an AWS EC2 instance.

Setup

  • Clone the repository
  • Populate .env with the following environment variables:
MONGODB_URI=
DOMAIN_NAME=
EMAIL=  # For caddy to provide when registering the domain for SSL

  • Update the domain name in Caddyfile
  • Build the docker image
docker compose build

  • Run the docker containers
docker compose up -d

  • Check logs
docker compose logs -f

Usage

Frontend

Go to your domain from the browser and use the text input and the No Cache checkbox to interact with the API and observe the request logs of the last 24 hours.

API

Example request:

curl "https://api.yourdomain.com/api/v1/play_store_suggestions/search?query=api"

Without caching:

curl "https://api.yourdomain.com/api/v1/play_store_suggestions/search?query=api&no_cache=true"

Example response:

{
  "count":5,
  "suggestions": [
    "api",
    "api healthcare mobile workforce",
    "apify",
    "api tester",
    "apics"
  ],
  "error": null
}