QHub Deployment on Amazon Web Services (AWS)¶
This guide walks you through the steps to create a QHub deployment on AWS. Deployments represent a set of multiple, identical Pods with no unique identities. A deployment runs multiple replicas of your application and automatically replaces any instances that fail or become unresponsive. In this way, deployments help ensure that one or more instances of your application are available to serve user requests. Deployments are managed by the Kubernetes Deployment controller.
A configuration defines the structure of your deployment, including the type and properties of the resources that are part of the deployment, any templates the configuration should use, and additional subfiles that can be executed to create your final configuration. You must specify a configuration in order to create a deployment. A configuration allows you to define a variety of resources and features that you would like to setup for your deployment. Let’s have a closer look into how you can configure your QHub deployment for AWS.
You can start the deployment process by providing the configuration details of your project. To abstract away some of the complexities of writing up your own configuration file for QHub deployment on AWS, the configuration file,
qhub-config.yaml, written in the YAML syntax, is provided for you. The configuration file consists of three main sections: General, Security, Provider Infrastructure that you need to edit with your project and authentication details.
This section allows setting up the general information about your project.
project_name: qhub-aws-deployment ci_cd: type: github-actions branch: main provider: aws domain: "aws.qhub.dev"
project_nameis the name that the resources within the cloud deployment/kubernetes will be deployed with.
providerspecifies which cloud provider to use for the deployment. For AWS, you will write
aws. Other options include
dofor Digital Ocean and
gcpfor Google Cloud Platform.
ci_cdrefers to the continuous integration and continuous deployment framework to use. Currently, github-actions is the supported framework. The
branchparameter allow the user to control the main branch that is watched for PRs and commits.
domainis the top level url to put qhub and future services under such a monitoring. In this example,
aws.qhub.devwould be the domain for qhub to be exposed under.
The security section of the configuration file is for configuring security and authentication details, relating to your QHub deployment. You will need to provide your Amazon Web Services account credentials on the appropriate lines of the configuration file as shown below.
security: authentication: type: GitHub config: client_id: <CLIENT_ID> client_secret: <CLIENT_SECRET> oauth_callback_url: https://jupyter.aws.qhub.dev/hub/oauth_callback auth0_subdomain: qhub-dev
You can choose the authentication type you would like to use. In the example above, the configuration shows authentication type as Github, but
Auth0 is also supported as seen below.
security: authentication: type: Auth0 config: client_id: ... client_secret: ... oauth_callback_url: ... scope: ['openid', 'email', 'profile'] auth0_subdomain: ...
2.1 Users and Groups¶
The configuration file also allows you to specify
groups that you would like to give access to your project and its deployment. This portion of the file enables you to provision unix permissions to each user. All users are assigned a
primary_group is the group name assigned to files that are written for the user while
groups is simply a mapping of group name to group ids. It is recommended to not change the ids assigned to groups and users after creation since it may lead to login problems. While the example below shows ids around 1000, it is recommended to start with high uid numbers, such as 10000000.
ids, technically, supports 2 billion
users: <user_email>: uid: 1000 primary_group: users <user_email>: uid: 1001 primary_group: admin groups: users: gid: 1000 admin: gid: 1001
3. Provider Infrastructure¶
This section of the configuration file refers to the kubernetes infrastructure deployment on AWS. Kubernetes is a cloud agnostic, portable, extensible, open-source platform for managing containerized workloads and services that facilitates both declarative configuration and automation. Kubernetes runs your workload by placing containers into Pods to run on Nodes. A node may be a virtual or physical machine, depending on the cluster. The following configuration sets up a kubernetes deployment with autoscaling node groups.
In this section, you need to specify the region and instance types, and number of nodes that you would like to use for deployment. To see available instance types and pricing, please refer to AWS website
amazon_web_services: region: us-west-2 availability_zones: ["us-west-2a", "us-west-2b"] kubernetes_version: "1.14" node_groups: general: instance: "m5.large" min_nodes: 1 max_nodes: 1 user: instance: "m5.large" min_nodes: 1 max_nodes: 2 worker: instance: "m5.large" min_nodes: 1 max_nodes: 2
For AWS, as well as for the other cloud service providers, adding an additional node group is as easy as adding a node group such as
high-memory: instance: "s-2vcpu-4gb" min_nodes: 1 max_nodes: 50
An image is the term used to refer to a serialized copy of the state of an environment stored in a file. Default images section in the configuration file lists the images used to run the default image if it is not already specified in a profile (described in the next section). The
jupyterhub key controls the jupyterhub image run.
default_images: jupyterhub: "quansight/qhub-jupyterhub:v0.3.12" jupyterlab: "quansight/qhub-jupyterlab:v0.3.12" dask_worker: "quansight/qhub-dask-worker:v0.3.12" dask_gateway: "quansight/qhub-dask-gateway:v0.3.12"
This section allows you to control the amount of storage allocated to shared filesystems. Please note, when you change the storage size, it will delete the previous storage.
storage: conda_store: 20Gi shared_filesystem: 10Gi
Profiles are used to control the jupyterlab user instances and dask-workers provided by dask-gateway.
profiles: jupyterlab: - display_name: Small Instance description: Stable environment with 1 cpu / 1 GB ram users: - <username> groups: - admin default: true kubespawner_override: cpu_limit: 1 cpu_guarantee: 1 mem_limit: 1G mem_guarantee: 1G image: "quansight/qhub-jupyterlab:v0.3.12" - display_name: Medium Instance description: Stable environment with 1.5 cpu / 2 GB ram kubespawner_override: cpu_limit: 1.5 cpu_guarantee: 1.25 mem_limit: 2G mem_guarantee: 2G image: "quansight/qhub-jupyterlab:v0.3.12" dask_worker: "Small Worker": worker_cores_limit: 1 worker_cores: 1 worker_memory_limit: 1G worker_memory: 1G image: "quansight/qhub-jupyterlab:v0.3.12" "Medium Worker": worker_cores_limit: 1.5 worker_cores: 1.25 worker_memory_limit: 2G worker_memory: 2G image: "quansight/qhub-jupyterlab:v0.3.12"
profiles.jupyterlab is a named jupyterlab profile. It closely follows the [KubeSpawner](https://jupyterhub kubespawner.readthedocs.io/en/latest/spawner.html) api. The only exception is that two keys are added
groups which allow restriction of profiles to a given set of groups and users. We recommend using groups to manage profile access. Finally, we allow for configuration of the dask workers. In general, similar to the jupyterlab instances, you only need to configure the cores and memory.
An environment generally refers to a directory on your machine that contains a set of packages/dependencies that the program/application you want to run needs. One might think of a programming environment similar to a namespace as used in computer science, which refers to an abstract container (or environment) created to hold a logical grouping of names.
In your configuration file, you can add to the list of dependencies, as well as change the name of your environment if you would like to. You can also create multiple examples.
environments: "environment-default.yaml": name: default channels: - conda-forge - defaults dependencies: - python=3.7 - ipykernel - ipywidgets - qhub-dask==0.3.12 - numpy - numba - pandas "environment-example-2.yaml": name: example-2 channels: - conda-forge - defaults dependencies: - python=3.7 - ipykernel - ipywidgets - qhub-dask==0.3.12 - numpy - numba - pandas - jinja2 - pyyaml
After you edit the configuration file, run
./scripts/00-guided-install.sh to complete the QHub AWS deployment steps!