2. AWS
Amazon Web Services was the chosen cloud vendor for hosting this project's infrastructure.
What is AWS?
AWS (Amazon Web Services) is a cloud platform that gives you on-demand access to things like computing power, storage, and databases, along with a wide variety of other services. Instead of setting up and maintaining physical servers, you can use AWS to quickly build, deploy, and scale applications of all sizes. From hosting websites to running machine learning models, AWS provides flexible tools to support different kinds of projects, with built-in options for security, monitoring, and global reach.
View more about AWS
Services Used
Note
The instructions below are all focused on creating AWS resources through the web console (can be helpful if new to AWS to learn how to navigate the console) . Since this project uses Terraform, all resources can be created and destroyed through IaC. Refer to the Terraform page to create the resources through Terraform.
IAM
Identity and Access Management (IAM) is a service that helps control access to resources on AWS. With IAM, you can manage permissions that control which AWS resources users can access.
In the case of being a solo developer, different "users" will be treated as service accounts. One example for this usage is creating a service account that can only access Elastic Container Registry (ECR) in a CI/CD pipeline that pushes a new image to ECR.
By setting up the service account to only access to ECR, the Principle of Least Privilege is ensured.
Currently, the project has two service accounts:
elastic-container-registry-user
terraform-user
Service account names should make it obvious which resources they can access. AWS recommends adding services accounts to a group and then assigning permissions to that group. However, again with being a solo developer on a single project, an IAM group is not used.
This would make more sense if there were several users or different projects under the same account.
Setup Instructions
- Visit the IAM Console.
- On the left, under Access Management, click on Users.
- Click on the Create User button in the upper-right.
- Provide a name for the user. Ideally, the name should reflect the role or service it'll work with.
- Click next.
- Choose to attack policies directly
- In the Permission Policy section, the option to attach an existing AWS managed policy or create a custom one exists.
- AWS Managed Policies
- Depending on what the user account is being created for, an existing AWS managed policy could suffice.
- For example, this project's
elastic-container-registry-user
account has the AWS managedSecretsManagerReadWrite
policy that allows it to read and write secrets from/to Secret Manager.
- Custom Policies
- For even more fine-grain control and granting least-privilege, custom or customer managed policies can be created.
- For example, this project's
terraform-user
account has a policy that grants access to describe resources with in the EC2 service:{ "Version": "2012-10-17", "Statement": [ { "Sid": "TerraformerRDSPermissions", "Effect": "Allow", "Action": [ "ec2:DescribeVpcAttribute", "ec2:DescribeVpcs", "ec2:DescribeRouteTables", "ec2:DescribeSubnets", "ec2:DescribeInternetGateways", "ec2:DescribeSecurityGroups", "ec2:DescribeNatGateways", "ec2:DescribeVpcEndpoints" ], "Resource": "*" } ] }
- ...
VPC
Virtual Private Cloud
Creating a custom VPC instead of using the default one provides full control over network configuration, security, and isolation tailored to specific application requirements. At first, the VPC will have public subnets to test the local version of Dagster to make sure everything is working correctly. The VPC will then be modified to only have private subnet groups.
AWS creates a default VPC, but learning to create one can be invaluable when needing to trouble connection issues.
Setup Instructions
- Visit the VPC Console.
- Choose to delete or keep the default VPC(s).
- Click on Create VPC.
- Under Resources to create, choose VPC and more.
- For Name tag auto-generation, enter a name. Such as the project's name.
- This project uses a CIDR block of
10.0.0.0/20
but a different can be chosen if needed. Learn more about CIDR. - Choose 2 public subnets. (Only for testing Dagster locally).
- Choose 2 private subnets.
- Choose 0 NAT gateways since there is a cost to use them.
- Create tags if wanted to organize resources on AWS.
- Click Create VPC.
RDS
Relational Database Service
Amazon RDS is a managed service that simplifies the setup, operation, and scaling of relational databases in the cloud. In this project, PostgreSQL is the database engine of choice for storing the metadata of Dagster.
The cost to maintain the database with the project's configuration options come out to ~$15.00 USD.
Setup Instructions
Note: these are the configuration options that were chosen for this project. Costs are the main driver behind these options. Feel free to choose any other options that could be more suitable.
- Visit the RDS console.
- On the dashboard, there should be an option Create a Database. If not, click on Databases on the left menu. Then click Create Database in the upper-right.
- Under Engine Options, choose PostgreSQL.
- Under Templates, choose Dev/Test. If eligible, use Free tier.
- Under Availability and Durability, choose Single Instance Deployment.
- Under Settings, give the database a name and let AWS manage the credentials.
- Under Instance Configuration, choose Burstable Classes and then select
t4g.micro
instance. - Change storage to the minimum of 20GB.
- Under Connectivity, choose to not connect to an EC2 instance. This can be done later.
- Choose the VPC that was created in the previous step.
- The subnet from the VPC should be already selected.
- Choose no for Public Access
- Keep the default VPC security group.
- This project does not have a preference on Availability Zones and uses the auto-generated Certificate Authority.
- Under Tags, create a new tag if desired for resource organization.
- Under Database Authentication, choose password authentication.
- Under Monitoring, choose the standard version of Database Insights. All other options in this section can be left as default.
- Review the Estimated Monthly Costs, make any changes if necessary, then click on create database.
EC2
Elastic Compute Cloud
AWS EC2 (Elastic Compute Cloud) is a cloud service that provides resizable virtual servers to run applications and workloads on demand.
Launch Instance
- Visit the EC2 console.
- Click on Launch Instance.
- Provide a name for the virtual machine.
- Under Application and OS Images, choose Ubuntu 24.04 (HVM), SSD Volume Type 64-bit ARM or a different image if preferred.
- Under Instance Type, choose
t2.small
. - Under Key Pair (login), select a key pair or create a new one. If a new one is created, check for the
.pem
file in the downloads folder. - Under Network Settings:
- Select the VPC created earlier.
- Switch to a public subnet to allow connection to the virtual machine.
- Enable Auto-assign public IP.
- For the Firewall, select the default security group that should've been created when setting up the VPC.
- Under Configure Storage, leave as default.
- Under Advanced Details, lease as default.
Connect to Instance
- First, configure a trusted connection to the previously created RDS instance.
- Visit the RDS console.
- Click on the RDS instance previously created.
- Scroll down to the Connected Compute Resources section, in the Actions drop-down, click Set up EC2 Connection.
- On the next screen, select the created EC2 instance from the drop-down. Then, select Continue.
- On the Review and Confirm screen, review all information then click Continue.
- SSH into machine.
- Back in the EC2 console, click on the created EC2 instance.
- In the top-right of the Summary section, click on the Connect button.
- On the next page, click on the SSH Client tab.
- Instructions on how to connect will be provided and
ssh
command will be provided. For example:ssh -i "dagster-vm-key-pair.pem" ubuntu@ec2-<ip-address-of-vm>.<region>.compute.amazonaws.com
- Note: Run this command in the directory of the
.pem
file. - Note: Since the virtual machine was created with the default VPC security group, make sure the Inbound Rules of the security allows your IP address to connect.
- The terminal should show an Ubuntu welcome screen once connected.
Configure Instance
Once connected to the virtual machine, run the following commands to get everything set up:
- Clone repository
- Create a new directory:
git init <dir-name>
cd <dir-name>
git remote add -f origin https://github.com/digitalghost-dev/poke-cli/
git config core.sparseCheckout true
echo "card_data/" >> .git/info/sparse-checkout
git pull origin main
ls
- verify thatcard_data/
directory was created.
- Create a new directory:
- Install tools
- Install
uv
for Python:curl -LsSf https://astral.sh/uv/0.7.21/install.sh | sh
- Add to
PATH
:source $HOME/.local/bin/env
- Install libraries from
pyproject.toml
file:uv sync
- Activate virtual environment:
source .venv/bin/activate
- Create
dagster.yaml
file: - Set environment variables:
echo 'export DAGSTER_HOME="$HOME/.dagster"' >> ~/.bashrc
echo 'export SUPABASE_USER="supabase_user"' >> ~/.bashrc
echo 'export SUPABASE_PASSWORD="supabase_password"' >> ~/.bashrc
source ~/.bashrc
- to load variables in current session.
- Install
- Verify Dagster and Connectivity
dg dev --host 0.0.0.0 --port 3000
- In the browser, visit
http://<ip-address-of-vm>:3000