๐Ÿ“ฆ anna-geller / dataflow-ops

Project demonstrating how to automate Prefect 2.0 deployments to AWS ECS Fargate

โ˜… 115 stars โ‘‚ 24 forks ๐Ÿ‘ 115 watching โš–๏ธ Apache License 2.0
analyticsanalytics-engineeringautomationawscicddatadata-engineeringdata-engineering-infrastructuredata-engineering-pipelinedata-sciencedataflowdataflow-opsinfrastructure-as-codeobservabilityorchestrationpipelineprefectpythonserverless
๐Ÿ“ฅ Clone https://github.com/anna-geller/dataflow-ops.git
HTTPS git clone https://github.com/anna-geller/dataflow-ops.git
SSH git clone git@github.com:anna-geller/dataflow-ops.git
CLI gh repo clone anna-geller/dataflow-ops
Anna Geller Anna Geller Update README.md e0333df 3 years ago ๐Ÿ“ History
๐Ÿ“‚ e0333df10d510d36ff450a4028469ca4955b1d49 View all commits โ†’
๐Ÿ“ .github
๐Ÿ“ blocks
๐Ÿ“ dataflowops
๐Ÿ“ flows
๐Ÿ“ infrastructure
๐Ÿ“ utilities
๐Ÿ“„ .gitignore
๐Ÿ“„ .prefectignore
๐Ÿ“„ deploy_flows.bash
๐Ÿ“„ Dockerfile
๐Ÿ“„ LICENSE
๐Ÿ“„ README.md
๐Ÿ“„ requirements.txt
๐Ÿ“„ scheduling.bash
๐Ÿ“„ setup.py
๐Ÿ“„ README.md

Template for Prefect deployments with Continuous Deployment GitHub Actions workflow and one-click agent deployment

The goal of this repository template is to make it easy for you to get started with Prefect.

Ideally, you should be able to:

  • Clone this repository, or create your own repository from this template
  • Configure GitHub Actions secrets (AWS credentials and Prefect Cloud v2 API key)
  • Start the GitHub Actions workflow defined in this YAML file

For more detailed usage, check out this blog post and this video demo.

Note about the agent

When you start a Prefect agent on AWS ECS Fargate, allocate as much CPU and memory as needed for your workloads. Your agent needs enough resources to appropriately provision infrastructure for your flow runs and to monitor its execution. Otherwise, your flow runs may get stuck in a Pending state.

Questions?

If you have any questions or issue using this template, feel free to open a GitHub issues directly on this repo, or reach out via Discourse or Slack

Extra: additional deployment CLI examples based on your platform, storage and infrastructure (not only related to AWS)

Table with examples

Storage BlockInfrastructure BlockEnd ResultCLI Build Command for hello.py flow with flow function helloPlatform
N/AN/ALocal storage and local process on the same machine from which you created a deploymentprefect deployment build hello.py:hello -a -n implicit -q devLocal/VM
N/AN/ALocal storage and local process on the same machine from which you created a deployment โ€” but with version and storing the output YAML manifest with the given file name in the deploy directoryprefect deployment build hello.py:hello -a -n implicit-with-version -q dev -v githubsha -o deploy/implicitwithversion.yamlLocal/VM
N/A-ib process/devLocal storage and local process on the same machine from which you created a deployment, but in contrast to the example from the first row, this requires you to create this Process block with name dev beforehand explicitly, rather than implicitly letting Prefect create it for you as anonymous blockprefect deployment build hello.py:hello -a -n implicit -q dev -ib process/devLocal/VM
N/A-ib process/devLocal storage and local process block but overriding the default environment variable to set log level to debug via --override flagprefect deployment build hello.py:hello -a -n implicit -q dev -ib process/dev --override env.PREFECTLOGGING_LEVEL=DEBUGLocal/VM
N/A--infra processLocal storage and local process on the same machine from which you created a deployment, but in contrast to the example in the first row, it explicitly specifies that you want to use process block; the result is exactly the same, i.e. Prefect will create an anonymous Process blockprefect deployment build hello.py:hello -a -n implicit -q dev --infra processLocal/VM
-sb s3/dev-ib process/devS3 storage block and local Process block - this setup allows you to use a remote agent e.g. running on an EC2 instance; any flow run from this deployment will run as a local process on that VM and Prefect will pull code from S3 at runtimeprefect deployment build hello.py:hello -a -n s3-process -q dev -sb s3/dev -ib process/devAWS S3 + EC2
-sb s3/dev-ib docker-container/devS3 storage block and DockerContainer block - this setup allows you to use a remote agent e.g. running on an EC2 instance; any flow run from this deployment will run as a docker container on that VM and Prefect will pull code from S3 at runtimeprefect deployment build hello.py:hello -a -n s3-docker -q dev -sb s3/dev -ib docker-container/devAWS S3 + EC2
-sb s3/dev-ib kubernetes-job/devS3 storage block and KubernetesJob block - this setup allows you to use a remote agent running as Kubernetes deployment e.g. running on an AWS EKS cluster; any flow run from this deployment will run as a Kubernetes job pod within that cluster and Prefect will pull code from S3 at runtimeprefect deployment build hello.py:hello -a -n s3-k8s -q dev -sb s3/dev -ib kubernetes-job/devAWS S3 + EKS
-sb gcs/dev-ib process/devGCS storage block and local Process block - this setup allows you to use a remote agent e.g. running on Google Compute Engine instance; any flow run from this deployment will run as a local process on that VM and Prefect will pull code from GCS at runtimeprefect deployment build hello.py:hello -a -n gcs-process -q dev -sb gcs/dev -ib process/devGCP GCS + GCE
-sb gcs/dev-ib docker-container/devGCS storage block and DockerContainer block - this setup allows you to use a remote agent e.g. running on Google Compute Engine instance; any flow run from this deployment will run as a docker container on that VM and Prefect will pull code from GCS at runtimeprefect deployment build hello.py:hello -a -n gcs-docker -q dev -sb gcs/dev -ib docker-container/devGCP GCS + GCE
-sb gcs/dev-ib kubernetes-job/devGCS storage block and KubernetesJob block - this setup allows you to use a remote agent running as Kubernetes deployment e.g. running on GCP GKE cluster; any flow run from this deployment will run as a Kubernetes job pod within that cluster and Prefect will pull code from GCS at runtimeprefect deployment build hello.py:hello -a -n gcs-k8s -q dev -sb gcs/dev -ib kubernetes-job/devGCP GCS + GKE
-sb azure/dev-ib process/devAzure storage block and local Process block - this setup allows you to use a remote agent e.g. running on Azure VM instance; any flow run from this deployment will run as a local process on that VM and Prefect will pull code from Azure storage at runtimeprefect deployment build hello.py:hello -a -n az-process -q dev -sb azure/dev -ib process/devAzure Blob Storage + Azure VM
-sb azure/dev-ib docker-container/devAzure storage block and DockerContainer block - this setup allows you to use a remote agent e.g. running on Azure VM instance; any flow run from this deployment will run as a docker container on that VM and Prefect will pull code from Azure storage at runtimeprefect deployment build hello.py:hello -a -n az-docker -q dev -sb azure/dev -ib docker-container/devAzure Blob Storage + Azure VM
-sb azure/dev-ib kubernetes-job/devGCS storage block and KubernetesJob block - this setup allows you to use a remote agent running as Kubernetes deployment e.g. running on Azure AKS cluster; any flow run from this deployment will run as a Kubernetes job pod within that cluster and Prefect will pull code from Azure storage at runtimeprefect deployment build hello.py:hello -a -n az-k8s -q dev -sb azure/dev -ib kubernetes-job/devAzure Blob Storage + AKS