Building a Python package

The “standard” way of building a Python package has gone through many changes in recent years, and if you have not been keeping track it can be difficult to know the best approach for creating a package that is easy to install, easy to maintain, and easy to publish.

Here we go over the (relatively) new standard for Python packaging, the all-encompassing pyproject.toml file. We strongly recommend developers at DESC consider migrating to this new packaging standard, if you haven’t already, particularly when starting a new software project from scratch. The intuitive, clean and maintainable format of pyproject.toml packages makes them both easy to develop and publish with a minimal amount of effort.

This guide assumes some basic knowledge of putting together a piece of Python software, such as creating your own modules, etc. For those new to Python software, check out the official Python guide to get started. Also note this is by no means an exhaustive tutorial on the subject of Python packaging. A great additional resource for that is the Python Packaging Guide itself, or the guide from RealPython.

Much of this guide is written under the assumption that the Python package you are creating will be hosted by the DESC GitHub repository, however the majority of the information will still apply even if this is not the case. If you are not yet familiar with Git or GitHub, or you need help getting setup on the DESC GitHub repository, checkout this guide on the DESC Confluence page, or this more general getting started guide for Git.

The accompanying repository which hosts the demo Python package we will often refer to in this guide can be found here. The repositories package, named mydescpackage, is very simple, only consisting of a few callable mathematical functions. You are welcome to download/fork it and use it as a starting template for your Python project.

Note

For those wondering what a TOML file is, TOML is a file format for configuration files, similar to YAML, it stands for “Tom’s Obvious Minimal Language”.

The directory structure

Let’s get started, first we go over the general directory structure your Python package should adhere to:

/path/to/my/project/
├── README.md
├── pyproject.toml
├── LICENCE
├── .gitignore
├── src/
│   └── mydescpackage/
│       ├── __init__.py
│       ├── _version.py
│       ├── file1.py
│       └── file2.py
├── tests/
│   ├── test1.py
│   └── test2.py
├── .github/
│   └── workflows/
│       └── ci.yml
└── docs/

Note that not all of these files and directories are strictly required. As a minimum you should have a README.md and pyproject.toml file in your base project directory, and the code for your software should populate the src/mydescpackage/ directory (replacing “mydescpackage” with the name of your package).

What are these files and directories…

  • README.md: A simple markdown file, typically outlines the project, its requirements, installation instructions, authors, etc. The contents of this file will also be displayed on your GitHub projects’ landing page.

  • pyproject.toml: Where the build information, project dependencies, metadata, etc, of the Python package are stored (more in the next section).

  • LICENCE: Contains the license of the package, outlining any restrictions of its use. It is good practice to use a well-known license rather than a self-created license, such as; GNU, Apache licence, MIT license or creative commons license.

  • .gitignore: This file specifies intentionally untracked files that Git should ignore (see here for more details).

  • src/mydescpackage/: The code for your Python software goes here.

  • src/mydescpackage/_version.py: Stores the version number of our Python package (see Automatic versioning for more details)

  • tests/: Any unit tests of your Package go in here (see also our guide on Continuous Integration).

  • .github/workflows/: Your GitHub Actions Continuous Integration workflows go in here (see our guide on Continuous Integration for more details on CI workflows).

  • docs/: For any extensive documentation beyond the scope of README.md, Read the Docs files for example.

Once the directory structure is setup, and it is populated with our software, we can move onto telling pip how to build and install our package.

Note

Many people prefer placing the source code in a src/ directory, and not in the project’s root directory. This is a preference, and not a requirement, you can have a “flat” directory structure where mydescpackage/ resides in the root project folder. However, having a src/ directory requires the user to first install the software before it can run, breaking the habit of running the source code directly within the root project directory (don’t worry, you still only have to install the package once with an editable install, see more about this later on).

The pyproject.toml file

The pyproject.toml configuration file was introduced in PEP518 as a way of specifying the minimum build requirements when installing a Python package. This tells the system what packages are required during the building process itself (e.g., setuptools, wheel), removing the onus of pre-installing any dependencies required to build your package away form the user. The build requirements specified in pyproject.toml are installed in an isolated environment, used to build the package, and later discarded, keeping your base environment clean and tidy.

Below we go over the pyproject.toml file from our demo package.

Click to expand pyproject.toml
 1[build-system]
 2requires = ["setuptools >= 61.0"] # PEP 621 compliant
 3build-backend = "setuptools.build_meta"
 4
 5[project]
 6name = "mydescpackage"
 7description = "Example DESC Python package, some simple mathmatical functions."
 8readme = "README.md"
 9authors = [{ name = "Stuart McAlpine", email = "stuart.mcalpine@fysik.su.se" }]
10license = { file = "LICENCE" }
11classifiers = [
12    "Programming Language :: Python :: 3",
13]
14keywords = ["desc", "python"]
15dependencies = [
16    'numpy',
17]
18requires-python = ">=3.8"
19dynamic = ["version"] # Scrape the version dynamically from the package
20
21[tool.setuptools.dynamic]
22version = {attr = "mydescpackage._version.__version__"}
23
24[tool.setuptools.packages.find]
25where = ["src"]
26
27[project.optional-dependencies]
28ci = ["pytest", "pytest-cov", "flake8"]
29
30[project.scripts]
31display-pi = "mydescpackage.pi:display_pi"

The build system

To specify which build-backend to use for installing your package, and any requirements needed during the build process, include something like this at the top of your pyproject.toml file.

1[build-system]
2requires = ["setuptools >= 61.0"] # PEP 621 compliant
3build-backend = "setuptools.build_meta"

Here we are saying we require the setuptools package during the build, and we are going to use setuptools to build the our Python package as our build-backend. Other common requirements during the build process are wheel and cython. Note we select a specific version of setuptools to install, setuptools>=61.0, as that is when setuptools became PEP 621 compliant (see project metadata later).

Note

You do not have to use setuptools as your build-backend, you can use alternate Python package managers such as Poetry, or Flit. You can even put your own custom build-backend here if you have very specific requirements for building your package. However if you are unsure, stick with setuptools.

In theory this is the minimum we need. If you were to install your package via pip at this stage (i.e., pip install .) it would use the specified information from pyproject.toml for the build system, then continue to install your package with some generic default values (or by looking for more information in the legacy setup.py and setup.cfg files).

But there is so much more information we can provide in pyproject.toml about our package, such as any dependencies, and general metadata. If you have built Python packages in the past you may be more familiar with putting this kind of information in the traditional setup.py and setup.cfg files. However now everything can go in pyproject.toml, making it the only configuration file you need (note you can still keep the traditional setup.* files for legacy purposes, and backwards compatibility).

Project metadata

As of PEP621 there is a standard format for storing project metadata in pyproject.toml, which setuptools>=61.0.0 conforms to (see their tutorial on metadata here). Below is the metadata for our demo package:

 5[project]
 6name = "mydescpackage"
 7description = "Example DESC Python package, some simple mathmatical functions."
 8readme = "README.md"
 9authors = [{ name = "Stuart McAlpine", email = "stuart.mcalpine@fysik.su.se" }]
10license = { file = "LICENCE" }
11classifiers = [
12    "Programming Language :: Python :: 3",
13]
14keywords = ["desc", "python"]
15dependencies = [
16    'numpy',
17]
18requires-python = ">=3.8"
19dynamic = ["version"] # Scrape the version dynamically from the package

All metadata goes under the [project] section, including for example the name of your package, the minimum required Python version, and the package dependencies.

For our configuration, the package will be installed as mypackage, it requires Python versions >=3.7 to run, and depends on the numpy package. Many of the metadata fields are optional, but it is useful to be as thorough as possible in detailing the package, especially if you publish the package to PyPi for example (for a list of all metadata options see here).

24[tool.setuptools.packages.find]
25where = ["src"]

Because we are using the src/ directory layout for our package, we need to tell setuptools this is where our Package’s source code is (the default is .). Any sub-directories of src/ with an __init__.py file will automatically be discovered by setuptools.

Optional dependencies

The packages you list under [project] dependencies should be the minimum required for the Python software to operate under general use. Yet optional dependencies, for alternate use scenarios, can also be included.

For example, in our demo package we need the pytest, pytest-cov and flake8 package’s when invoking the Continuous Integration workflows. As these package’s are only needed when performing CI, and not for the general running of the package, we include them as optional dependencies, which can be installed alongside the main dependencies by running pip install .[ci].

27[project.optional-dependencies]
28ci = ["pytest", "pytest-cov", "flake8"]

Optional dependencies are also useful if you want to separate out serial and parallel (i.e., MPI) implementations of your package, packages required only during development, or installations where you wish to also compile package’s documentation, for example.

Script entrypoints

Another extremely useful thing to be aware of with Python packages is script entrypoints. Here you can declare commands to be run from the terminal which will directly execute functions within your package. For example, in our demo package we have a function that computes the numerical value of pi. As we keep forgetting the value of pi, and need to be reminded, we register the display-pi command to help us, which directly calls the mydescpackage.pi.display_pi function (which prints the computed value of pi to the terminal).

30[project.scripts]
31display-pi = "mydescpackage.pi:display_pi"

Script entrypoints are great for creating front-ends to your package.

Automatic versioning

An extremely important attribute of your Python package is its version number, for which it is good practice to use the Semantic Versioning format (i.e., MAJOR.MINOR.PATCH). The pseudo-standard for Python packages is to store the version number as a string variable called __version__ in the root of the package, e.g., mydescpackage.__version__.

We are going to have to manually declare our chosen version number somewhere within our project, however we certainly want to avoid manual declarations in multiple places, some of which we may forget to update (e.g., between the pyproject.toml file and within the source code). There are many options that allow you to only declare the version number once, yet there is no current standard for which practice is best.

We would recommend declaring the package version number in a _version.py file in the package source code directory (i.e., src/mydescpackage/). This option has the advantage that mydescpackage.__version__ can be called both in the scenario where the package has been installed via pip, or if the source code is being called upon manually straight from the cloned repository.

To do this, create a file called _version.py in the src/mydescpackage/ directory with the following:

__version__ = "1.0.0"

Note

We put the __version__ variable in a file called _version.py instead of version.py so that pip does not install mydescpackage.version as a callable method.

Then, include this line in the __init__.py file in the src/mydescpackage/ directory:

from ._version import __version__

Finally, we can tell pip to use this as the package version number by updating our pyproject.toml file with the following:

[project]
...
dynamic = ["version"]

[tool.setuptools.dynamic]
version = {attr = "mydescpackage._version.__version__"}

Installing your package (from source)

Finally, once the pyproject.toml file is built, we can install the package locally from source using pip just like we have always traditionally done. Within the project directory type:

pip install -e .

Note the -e flag means an “editable install”, which is extremely useful, particularly during the development stage of your software. An editable installation works very similarly to a regular install with pip install ., except that it only installs your package dependencies, metadata and wrappers for console and GUI scripts, but your system will point to the code directly in your project folder using a special link. This means that any changes in the Python source code can immediately take place without requiring a new installation.

As it stands, users wishing to install our package first have to clone the GitHub repository and install from source as shown above (which is fine). To make the installation slightly for users, we can place our package on a public software repository, such a PyPy or Conda, which we cover next.