Poetry
This is a legacy document, for left here for historical reasons. If you’re looking for a way manage your environment, checkout the uv guide.
The Problem
Pip, conda, virtualenv, venv, pipenv, poetry, pyenv, pyenv-virtualenv, requirements.txt, environment.yml, Pipfile, Pipfile.lock, poetry, setup.py, setup.cfg, uv, rye… A lot, right?
Python is a great language, but managing dependencies can be a nightmare. This is especially true when you’re working on multiple projects, each with its own set of dependencies. You use venv
, but then realize each project might require different version of python itself. So you use conda
, but then you have to remember to activate the right one every time you switch projects. And you don’t know which packages to install with conda
, and which with pip
. You switch to pipenv
, and you have to wait 20 minutes to install one package in a complex environment…
If only we could have cargo
for Python…
The Solution
Poetry
is a tool that solves all these problems. At least it solves them better than other tools.
The pythonic community sort-of agreed that Poetry
is the ~way to go~ best tool it has so far. It defines dependencies in pyproject.toml
, which is more readable than requirements.txt
and you can specify different groups of dependencies like dev
, deploy
and others. It also creates a virtual environment for you, so you don’t have to worry about it. Then there’s the poetry.lock
file that ensures that the versions of libraries you’ve specified will work together. It’s cross-platform, so you can use it on Windows, Mac, and Linux. By making environments tied to the project directory, you can easily switch between different projects without worrying about conflicts and which environment has been used where.
Poetry also handles multiple python version and packaging which is a big plus and you don’t have to worry about setup.py
and setup.cfg
files.
One might argue that we already achieve reproducibility with Devcontainers, but practice shows not everyone uses them, not in every setup they can be used. For example for deploying apps on Posit Connect, you cannot use Devcontainers! We recommend using Devcontainer with poetry
pre-installed, but if you want to use poetry
in your local environment, you should install it with pipx
. pipx
installs every package in a separate virtual environment, so it’s the recommended way to install CLI tools like poetry
.
Packaging
You may wonder why do we want to package our code. To visualize it better, let’s consider the following example:
Project structure
├── src
│ └── model.py
└── scripts
└── 01_analysis.py
And we get the following errors in 01_analysis.py
when trying to import from model.py
:
from my_project.model import get_model
ModuleNotFoundError: No module named 'my_project'
or
from ..my_project.model import get_model
ImportError: attempted relative import with no known parent package
Sounds familiar? I guess so.
The problem is that Python doesn’t know where to look for the src
package, and you often end up with a mess of sys.path.append
and PYTHONPATH
environment variables. This is where packaging comes in.
By adding a few lines to pyproject.toml
, you can make your project a package.
New structure
├── my_project
│ └── model.py
├── pyproject.toml
└── scripts
└── 01_analysis.py
pyproject.toml
generated with poetry new my_project
[tool.poetry]
name = "my_project"
version = "0.1.0"
description = ""
authors = ["Appsilon.com <hello@appsilon.com>"]
packages = [{include = "my_project"}]
[tool.poetry.dependencies]
python = "^3.12"
[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
and then, after running poetry install && poetry shell
you can import model.py
in 01_analysis.py
with from my_project.model import get_model
. 🎉
This is immensely useful, and this knowledge is definitely under-shared in the data science community.
Tapyr of course comes with packaging out of the box, so you don’t have to worry about it.
Installing Poetry-managed packages in a non-Poetry environment
While poetry
is a great tool, sometimes you may need to install a package in an environment without poetry
. This is the case when deploying a Shiny app on Posit Connect. Fortunately, it’s as easy as running pip install .
.
Note 1: You don’t need poetry
on your system to use project with poetry
-defined dependencies!
Why not conda
and when conda
?
Conda has its place in the world, but it’s not the best tool for managing dependencies in enterprise-ready projects. It’s good for data science projects, in particular you can install additional, non-python artifacts like r
or julia
! You can also install cuda
drivers with conda
, which allows you to have projects that use different versions of cuda
on the same machine (important for computing clusters)!
However, conda
is not known for its reproducibility nor for its speed. It also don’t provide a built-in packaging system, so you have to use setuptools
and setup.py
files. It’s just built for a different purpose.
poetry
doesn’t work well with pytorch
, but you rarely use it in Shiny apps. Better export your model to onnx
and use onnxruntime
.