Continuous Integration with GitHub Actions

The Problem

You have all things configured:

devcontainer.json with the development environment
pyproject.toml with dependencies
ruff for formatting and linting
pyright for static type checking
pytest for testing, even with playwright for end-to-end tests
pre-commit for additional pre-commit hooks

and possibly more.

Now, there’s no way that you will run it from the very beginning to the very end every time you make a slightest change. But you should!

The Solution

Our last line of defense - CI/CD. CI/CD stands for Continuous Integration/Continuous Deployment. Before we merge the code to the main branch, we want to make sure that it goes through all the checks above. The GitHub Actions do exactly that! You don’t even to thing about it, it’s all automated, you just check if you see a green tick or a red cross next to your pull request!

Important

Tapyr’s CI pipeline ensures that the code works and meets the quality standards from building the environment to running the end-to-end tests.

CI with GitHub Actions

Every time you push the code to the repository, the CI pipeline builds the environment runs the tests, checks the code quality. Only then you can merge the code to the main branch.

This is crucial, and is another side of the Coding style should never be a topic of discussion coin. Code style extends here to the topic of linting and testing. Even when you had that lucky data science project with tests, how many times you’ve heard yeah, sorry I didn’t tests locally, but it should work anyway, right? right? Or I’ll just push it to the main branch, it’s just a small change? We all know that it’s not true and that the code will break…

We understand that setting up CI/CD pipelines can be a daunting task, especially for data scientists without strict software engineering background. That’s why tapyr comes with a pre-configured setup for GitHub Actions that runs the tests, checks the code quality. It’s as simple as pushing the code to the repository.

In tapyr, by default, CI/CD is triggered by pull requests and pushes to the main branch.

Pipeline builds the Devcontainer, installs dependencies, runs the tests, and checks the code quality. This allows you to catch issues early in the development process, before they reach the main branch. Maybe you’ve missed some dependency in pyproject.toml or something apt install installable? You will know it immediately, before the code is merged!

How to use it?

If you use tapyr and Github, you don’t have to do anything! Just use the tapyr template and push any changes to the repository.

Closing note on the feedback loops in software development

The feedback loop is about the time between making a change and seeing the results. The shorter the feedback loop, the more efficient the development process.

Tools like ruff, pyright provide instant feedback in the editor. Then there are tests that provide feedback on the correctness of the code once they’re run. In modern IDEs, you can run selected tests (or all of them) with a single click, so the feedback loop is already quite short. However, building the Devcontainer, installing dependencies takes a bit longer, and you rarely want to rebuild the container every time you make a change, and this is where CI/CD and pre-commit comes in.

Tip

Good software engineering practices like

Configuring your IDE
Introducing pre-commit
Running CI/CD pipelines

are all about closing and shortening the feedback loop.