QHub admins are DevOps engineers, system administrators, scientists, and network architects who are responsible for the critical infrastructure that data scientists and engineers need to thrive. QHub is bundled with features that make installation easy while providing the ability to scale with your organization and data.
The content below is particularly for QHub producers, and those looking to learn more about the QHub architecture.
After you have cloned the QHub repo locally, you can install QHub through
python -m pip install -e .[dev]
zshusers may need to escape the square brackets
\[dev\]To install the pre-commit hooks, run:
After the installation, the next step is to configure QHub.
QHub is entirely controlled from a configuration file, which allows you to manage multiple environments and multiple teams, as well as their permissions and authorization in a robust way.
The Configuration File
With QHub, managing configurable data science environments and attaining seamless deployment with Github Actions become remarkably easy. Let’s look at how you can customize QHub for a data science architecture that meets your team’s needs.
Staging & Production Environments and Shell Access#
With QHub, you can have shell access and remote editing access through KubeSSH. The complete linux style permissions allows for different shared folders for different groups of users.
QHub comes with staging and production environments, as well as JupyterHub deploys.
QHub integrates Network File System (NFS) protocol is used to allow Kubernetes applications to access storage. Files in containers in a Kubernetes Pod are not persistent, which means if a container crashes, kubelet will restart the container, however, the files will not be preserved. The Kubernetes Volume abstraction that QHub utilizes solves this problem.
NFS shares files directly from a container in a Kubernetes Pod, and sets up a Kubernetes Persistent Volume accessed via NFS. Kubernetes’ built‑in configuration for HTTP load balancing Ingress defines and controls the rules for external connectivity to Kubernetes services. Users who need to provide external access to their Kubernetes services create an Ingress resource that defines rules.
QHub streamlines and manages all the Kubernetes architecture detailed above and delivers a smooth deployment process to its users through its intuitive interface.
QHub architecture and features allows you to:
manage configurable data science environments
handle multiple environments in a robust way
have seamless deployment with github actions
meet the needs of multiple teams and control permissions
Cloud Deployment on QHub#
QHub deployments on the clouds follow the architectural structure shown for each provider in the diagrams below. To make cloud deployments, the respective configuration file needs to be configured based on the user’s cloud provider account credentials, as well as the details of users they would allow access to the deployment.
Infrastructure Provisioning (Common for all Clouds)#
To provision the infrastructure, QHub uses Terraform, a tool for building, changing, and versioning infrastructure safely and efficiently. Terraform enables QHub to have Infrastructure as Code to provision and manage cloud, infrastructure or service. Terraform has a system of modules/provider for dealing with various cloud providers and various kinds of infrastructure, for instance it has modules for Amazon Web Services (AWS), Google Cloud Platform (GCP), Digital Ocean (DO), as well as for Kubernetes and Helm.
Kubernetes (Common for all Clouds)#
To manage the deployments on the Kubernetes cluster, QHub uses Helm, a package manager for Kubernetes. Helm packages Kubernetes configurations for deployment for ease of distribution so that you can simply use a ready made package and deploy it to your Kubernetes cluster.
The services are exposed via an Ingress component of Kubernetes. Helm uses a packaging format called Charts, which is a collection of files that describe a related set of Kubernetes resources. Charts can be packaged into versioned archives to be deployed. They are also easy to rollback.
The Helm provider of Terraform takes the overrides supported by Helm. This makes it easier to use a standard chart with custom settings, such as a custom image.
For JupyterHub and Dask, QHub uses the official Helm Charts and provide custom settings, which can be seen in:
jupyterhub.yaml, also with custom images,
which are stored in respective container registry of the cloud provider uses.
SSL and Ingress (Common for all Clouds)#
To expose various services, such as the JupyterHub and Dask, present in the Kubernetes Cluster, QHub uses Traefik Proxy which is a reverse proxy and load balancer.
SSL is a crucial part of any service exposed to the Internet. To handle this in Kubernetes, QHub utilizes cert manager, a popular Kubernetes add-on to automate the management and issuance of TLS certificates from various issuing sources.
AWS Cloud Architecture#
The architecture of AWS uses Virtual Private Cloud (VPC), which enables you to launch resources into a virtual network. The VPC is the logically isolated section of AWS, which enables you to control how your network and AWS resources inside your network are exposed to the Internet. There are subnets inside the VPC in multiple availability zones. The Kubernetes Cluster is inside the VPC, which by default isolates it from the internet by the very nature of VPC.
QHub uses AWS’s managed Kubernetes service: Elastic Kubernetes Service (EKS) to create a Kubernetes Cluster. Since VPC is an isolated part of the AWS, you need a way to expose the services running inside the Kubernetes to the Internet, so that others can access it. This is achieved by an Internet Gateway. It’s a VPC component that allows communication between the VPC and the Internet.
With QHub, system admins can customize and maintain their teams’ compute needs and environments. The autoscaling of computers (Kubernetes and Pods) is done through Dask autoscaling with CPU & GPU workers.
QHub uses Github, Auth0, or simple username/password for authentication.