What is QHub?

Open source tool for data science research, development, and deployment.

QHub is Infrastructure as Code that simplifies the deployment of data science projects using JupyterHub and Dask Gateway for you and your team.

Designed to simplify the deployment and maintenance of scalable computational platforms in the cloud, QHub is ideal for organizations that need a shared compute platform that is flexible, accessible, and scalable.

QHub Technology Stack

High-level illustration of QHub architecture

Components

The technology stack is an integration of the following existing open source libraries:

  • Terraform a tool for building, changing, and versioning infrastructure.

  • Kubernetes a cloud-agnostic orchestration system

  • Helm: a package manager for Kubernetes

  • JupyterHub: a shareable compute platform for data science

  • JupyterLab: a web-based interactive development environment for Jupyter Notebooks

  • Dask: a scalable and flexible library for parallel computing in Python

    • Dask-Gateway: a secure, multi-tenant server for managing Dask clusters

  • GitHub Actions: a tool to automate, customize, and execute software development workflows in a GitHub repository.

Amongst the newly created open source libraries on the tech stack are:

  • KubeSSH brings the SSH experience to a modern cluster manager.

  • Jupyter-Videochat allows video-chat with JupyterHub peers inside JupyterLab, powered by Jitsi.

  • conda-store serves identical conda environments and controls its life-cycle.

  • Conda-Docker, an extension to the docker concept of having declarative environments that are associated with Docker images allowing tricks and behaviour that otherwise would not be allowed.

Why use QHub?

QHub provides enables teams to build their own scalable compute infrastructure with:

  • Easy installation and maintenance controlled by a single configuration file.

  • Autoscaling JupyterHub installation deployed on the Cloud provider of your choice.

  • Option to choose from multiple compute instances, such as: namely normal, high memory, GPU, etc.

  • Autoscaling Dask compute clusters for big data using any instance type.

  • Shell access and remote editing access (i.e. VSCode remote) through KubeSSH.

  • Full linux style permissions allowing for different shared folders for different groups of users.

  • Robust compute environment handling allowing both prebuilt and ad-hoc environment creation.

  • Integrated video conferencing, using Jitsi.