๐Ÿ“ฆ anna-geller / dataflow-ops

Project demonstrating how to automate Prefect 2.0 deployments to AWS ECS Fargate

โ˜… 115 stars โ‘‚ 24 forks ๐Ÿ‘ 115 watching โš–๏ธ Apache License 2.0
analyticsanalytics-engineeringautomationawscicddatadata-engineeringdata-engineering-infrastructuredata-engineering-pipelinedata-sciencedataflowdataflow-opsinfrastructure-as-codeobservabilityorchestrationpipelineprefectpythonserverless
๐Ÿ“ฅ Clone https://github.com/anna-geller/dataflow-ops.git
HTTPS git clone https://github.com/anna-geller/dataflow-ops.git
SSH git clone git@github.com:anna-geller/dataflow-ops.git
CLI gh repo clone anna-geller/dataflow-ops
Anna Geller Anna Geller Update deploy_flows.bash 64418f3 3 years ago ๐Ÿ“ History
๐Ÿ“‚ 64418f341dfb292915d34569110ad13a236a4b51 View all commits โ†’
๐Ÿ“ .github
๐Ÿ“ blocks
๐Ÿ“ dataflowops
๐Ÿ“ flows
๐Ÿ“ infrastructure
๐Ÿ“ utilities
๐Ÿ“„ .gitignore
๐Ÿ“„ .prefectignore
๐Ÿ“„ deploy_flows.bash
๐Ÿ“„ Dockerfile
๐Ÿ“„ LICENSE
๐Ÿ“„ README.md
๐Ÿ“„ requirements.txt
๐Ÿ“„ scheduling.bash
๐Ÿ“„ setup.py
๐Ÿ“„ README.md

Template for Prefect deployments with Continuous Deployment GitHub Actions workflow and one-click agent deployment

The goal of this recipe is to have one Prefect agent (running on AWS ECS Fargate) with shared core package dependencies per project. This means that:

  • the pip packages such as pandas, numpy, scikit learn etc. are baked into an ECR image and this image gets used to deploy an agent process
  • the flow deployments can have custom module dependencies which are packaged alongside flow code to S3
  • the flow deployments are created automatically from CI/CD but they can also be created from your local machine and this will work the same way

Make sure to adjust your AWS account ID and your default region in the `task-definition.json` file.

Deployment CLI examples based on your platform, storage and infrastructure

Table with examples

Storage BlockInfrastructure BlockEnd ResultCLI Build Command for hello.py flow with flow function helloPlatform
N/AN/ALocal storage and local process on the same machine from which you created a deploymentprefect deployment build hello.py:hello -n implicit -q devLocal/VM
N/AN/ALocal storage and local process on the same machine from which you created a deployment โ€” but with version and storing the output YAML manifest with the given file name in the deploy directoryprefect deployment build hello.py:hello -n implicit-with-version -q dev -v githubsha -o deploy/implicitwithversion.yamlLocal/VM
N/A-ib process/devLocal storage and local process on the same machine from which you created a deployment, but in contrast to the example from the first row, this requires you to create this Process block with name dev beforehand explicitly, rather than implicitly letting Prefect create it for you as anonymous blockprefect deployment build hello.py:hello -n implicit -q dev -ib process/devLocal/VM
N/A-ib process/devLocal storage and local process block but overriding the default environment variable to set log level to debug via --override flagprefect deployment build hello.py:hello -n implicit -q dev -ib process/dev --override env.PREFECTLOGGING_LEVEL=DEBUGLocal/VM
N/A--infra processLocal storage and local process on the same machine from which you created a deployment, but in contrast to the example in the first row, it explicitly specifies that you want to use process block; the result is exactly the same, i.e. Prefect will create an anonymous Process blockprefect deployment build hello.py:hello -n implicit -q dev --infra processLocal/VM
-sb s3/dev-ib process/devS3 storage block and local Process block - this setup allows you to use a remote agent e.g. running on an EC2 instance; any flow run from this deployment will run as a local process on that VM and Prefect will pull code from S3 at runtimeprefect deployment build hello.py:hello -n s3-process -q dev -sb s3/dev -ib process/devAWS S3 + EC2
-sb s3/dev-ib docker-container/devS3 storage block and DockerContainer block - this setup allows you to use a remote agent e.g. running on an EC2 instance; any flow run from this deployment will run as a docker container on that VM and Prefect will pull code from S3 at runtimeprefect deployment build hello.py:hello -n s3-docker -q dev -sb s3/dev -ib docker-container/devAWS S3 + EC2
-sb s3/dev-ib kubernetes-job/devS3 storage block and KubernetesJob block - this setup allows you to use a remote agent running as Kubernetes deployment e.g. running on an AWS EKS cluster; any flow run from this deployment will run as a Kubernetes job pod within that cluster and Prefect will pull code from S3 at runtimeprefect deployment build hello.py:hello -n s3-k8s -q dev -sb s3/dev -ib kubernetes-job/devAWS S3 + EKS
-sb gcs/dev-ib process/devGCS storage block and local Process block - this setup allows you to use a remote agent e.g. running on Google Compute Engine instance; any flow run from this deployment will run as a local process on that VM and Prefect will pull code from GCS at runtimeprefect deployment build hello.py:hello -n gcs-process -q dev -sb gcs/dev -ib process/devGCP GCS + GCE
-sb gcs/dev-ib docker-container/devGCS storage block and DockerContainer block - this setup allows you to use a remote agent e.g. running on Google Compute Engine instance; any flow run from this deployment will run as a docker container on that VM and Prefect will pull code from GCS at runtimeprefect deployment build hello.py:hello -n gcs-docker -q dev -sb gcs/dev -ib docker-container/devGCP GCS + GCE
-sb gcs/dev-ib kubernetes-job/devGCS storage block and KubernetesJob block - this setup allows you to use a remote agent running as Kubernetes deployment e.g. running on GCP GKE cluster; any flow run from this deployment will run as a Kubernetes job pod within that cluster and Prefect will pull code from GCS at runtimeprefect deployment build hello.py:hello -n gcs-k8s -q dev -sb gcs/dev -ib kubernetes-job/devGCP GCS + GKE
-sb azure/dev-ib process/devAzure storage block and local Process block - this setup allows you to use a remote agent e.g. running on Azure VM instance; any flow run from this deployment will run as a local process on that VM and Prefect will pull code from Azure storage at runtimeprefect deployment build hello.py:hello -n az-process -q dev -sb azure/dev -ib process/devAzure Blob Storage + Azure VM
-sb azure/dev-ib docker-container/devAzure storage block and DockerContainer block - this setup allows you to use a remote agent e.g. running on Azure VM instance; any flow run from this deployment will run as a docker container on that VM and Prefect will pull code from Azure storage at runtimeprefect deployment build hello.py:hello -n az-docker -q dev -sb azure/dev -ib docker-container/devAzure Blob Storage + Azure VM
-sb azure/dev-ib kubernetes-job/devGCS storage block and KubernetesJob block - this setup allows you to use a remote agent running as Kubernetes deployment e.g. running on Azure AKS cluster; any flow run from this deployment will run as a Kubernetes job pod within that cluster and Prefect will pull code from Azure storage at runtimeprefect deployment build hello.py:hello -n az-k8s -q dev -sb azure/dev -ib kubernetes-job/devAzure Blob Storage + AKS