1 // Overview
This section serves as a knowledge base for the project’s backend data infrastructure. It was created for a few purposes:
- To document how I built everything, so I can easily reference it later.
- To help others learn how to build something similar.
- To provide a clear understanding of how all the components work together.
This infrastructure guide will be constantly updated as changes or updates to resources occur.
The infrastructure is mainly in support of TCG data since there are multiple data sources that are being used to support the CLI/TUI. The VGC data simply calls one API.
Warning
All of the commands ran for this project in the terminal are based on macOS (i.e., Homebrew to install packages). If building on a different operating system, please find the equivalent command. Links will be provided for install guides for all operating systems when possible.
Data Infrastructure Diagram
- TCGPlayer pricing data and TCGDex card data are called and processed through a data pipeline orchestrated by Dagster and hosted on AWS.
- When the pipeline starts, Pydantic validates the incoming API data against a pre-defined schema, ensuring the data types match the expected structure.
- Polars is used to create DataFrames.
- The data is loaded into a Supabase staging schema.
- Soda data quality checks are performed.
dbtruns tests and builds the final tables in a Supabase production schema.- Users are then able to query the
pokeapi.coorsupabaseAPIs for either video game or trading card data, respectively.
Tools & Services
Below is a list of all the tools and services used in this project's infrastructure:
- AWS
- RDS
- S3
- VPC
- EC2
- Dagster
- Polars
- Supabase
- Terraform
Project Layout
.
├── infrastructure/
│ ├── aws/
│ │ ├── .terraform
│ │ ├── ec2/
│ │ ├── rds/
│ │ └── vpc/
│ ├── dagster.server
│ ├── start-dagster.sh
│ └── wait-for-rds.sh
├── pipelines/
│ ├── defs/
│ │ ├── extract/
│ │ │ ├── tcgcsv/
│ │ │ │ └── extract_pricing.py
│ │ │ └── tcgdex/
│ │ │ ├── extract_cards.py
│ │ │ ├── extract_series.py
│ │ │ └── extract_sets.py
│ │ ├── load/
│ │ │ ├── tcgcsv/
│ │ │ │ └── load_pricing.py
│ │ │ └── tcgdex/
│ │ │ ├── load_cards.py
│ │ │ ├── load_series.py
│ │ │ └── load_sets.py
│ │ └── transform/
│ │ └── transform_data.py
│ ├── poke_cli_dbt/
│ │ ├── macros/
│ │ │ ├── create_relationships.sql
│ │ │ ├── create_rls.sql
│ │ │ └── create_view.sql
│ │ ├── models/
│ │ │ ├── cards.sql
│ │ │ ├── pricing_data.sql
│ │ │ ├── series.sql
│ │ │ ├── sets.sql
│ │ │ └── sources.yml
│ │ ├── dbt_project.yml
│ │ └── profiles.yml
│ ├── soda/
│ │ ├── checks_pricing.yml
│ │ ├── checks_series.yml
│ │ ├── checks_sets.yml
│ │ └── configuration.yml
│ ├── tests/
│ │ └── extract_series_test.py
│ └── utils/
│ ├── json_retriever.py
│ └── secret_retriever.py
├── dagster.yaml
├── pyproject.toml
└── uv.lock
Note
This project is a learning playground for exploring new tools, services, and programming languages. Some design choices are intentionally experimental or may not follow conventional patterns— that's part of the learning process!
Feedback and suggestions are always welcome! If you spot an issue or have ideas for improvement, please open a GitHub Issue.