Continuous Integration with GitHub Actions
The Problem
You have all things configured:
devcontainer.json
with the development environmentpyproject.toml
with dependenciesruff
for formatting and lintingpyright
for static type checkingpytest
for testing, even withplaywright
for end-to-end testspre-commit
for additional pre-commit hooks
and possibly more.
Now, there’s no way that you will run it from the very beginning to the very end every time you make a slightest change. But you should!
The Solution
Our last line of defense - CI/CD. CI/CD stands for Continuous Integration/Continuous Deployment. Before we merge the code to the main branch, we want to make sure that it goes through all the checks above. The GitHub Actions do exactly that! You don’t even to thing about it, it’s all automated, you just check if you see a green tick or a red cross next to your pull request!
Tapyr’s CI pipeline ensures that the code works and meets the quality standards from building the environment to running the end-to-end tests.
CI with GitHub Actions
Every time you push the code to the repository, the CI pipeline builds the environment runs the tests, checks the code quality. Only then you can merge the code to the main branch.
This is crucial, and is another side of the Coding style should never be a topic of discussion coin. Code style extends here to the topic of linting and testing. Even when you had that lucky data science project with tests, how many times you’ve heard yeah, sorry I didn’t tests locally, but it should work anyway, right? right? Or I’ll just push it to the main branch, it’s just a small change? We all know that it’s not true and that the code will break…
We understand that setting up CI/CD pipelines can be a daunting task, especially for data scientists without strict software engineering background. That’s why tapyr comes with a pre-configured setup for GitHub Actions that runs the tests, checks the code quality. It’s as simple as pushing the code to the repository.
In tapyr, by default, CI/CD is triggered by pull requests and pushes to the main branch.
Pipeline builds the Devcontainer, installs dependencies, runs the tests, and checks the code quality. This allows you to catch issues early in the development process, before they reach the main branch. Maybe you’ve missed some dependency in pyproject.toml
or something apt install
installable? You will know it immediately, before the code is merged!
How to use it?
If you use tapyr and Github, you don’t have to do anything! Just use the tapyr template and push any changes to the repository.
Closing note on the feedback loops in software development
The feedback loop is about the time between making a change and seeing the results. The shorter the feedback loop, the more efficient the development process.
Tools like ruff
, pyright
provide instant feedback in the editor. Then there are tests that provide feedback on the correctness of the code once they’re run. In modern IDEs, you can run selected tests (or all of them) with a single click, so the feedback loop is already quite short. However, building the Devcontainer, installing dependencies takes a bit longer, and you rarely want to rebuild the container every time you make a change, and this is where CI/CD and pre-commit
comes in.
Good software engineering practices like
- Configuring your IDE
- Introducing
pre-commit
- Running CI/CD pipelines
are all about closing and shortening the feedback loop.