Getting Started on the Rubin Science Platform
Greg Madejski and Phil Marshall
We are developing tutorial notebooks on remote JupyterLab instances, to short-circuit the DM stack installation process and get used to working in the notebook aspect of the Rubin Science Platform (RSP). In these notes we provide:
- Notes on how to get set up on the Rubin Science Platform (RSP) JupyterLab Notebook Aspect at the LSST Data Facility at NCSA
- Help with getting set up to run and edit the Stack Club tutorial notebooks
Accessing the Rubin Science Platform
The Rubin Science Platform (RSP) Notebook Aspect Documentation provides an introduction to the system, including how to gain access and then how to use JupyterLab once you are in. Access the RSP requires Rubin Observatory data rights, as described at ls.st/rdo-013. You will also need to get an NCSA account and connect through the NCSA VPN.
Getting a Rubin Science Platform Account
To join the Stack Club and request one of these accounts, please fill out the Stack Club Membership Application Form. You’ll need to agree to abide by the Rules, provide your full name (first and last), and your email address. If your application is successful, you’ll get an email with instructions on how to set up your RSP account.
Accessing the LSP via its VPN
At present, unless you are on an approved network, you must use the NCSA virtual private network (VPN). The recommended method is to use Cisco’s AnyConnect with DUO two-factor authentication (verified on Mac and Linux). Detailed instructions are available on the NCSA VPN site. The best documentation for getting setup with your account is on nb.lsst.io.
- Install and configure the NCSA VPN
- Log into the NCSA VPN (NB: Use the
ncsa-vpn-default
group; this may not be selected by default) - Log into the Notebook Aspect (NB: Use “NCSA as the identity provider”, not your institution)
If you forget your password it can be reset following the instructions here. If you have problems connecting to the NCSA services you can check their status and submit a help ticket here.
For a Linux install, you may need to pre-install openconnect
from your favorite package manager. For Mac OS X, you can also use openconnect-gui
[https://openconnect.github.io/openconnect-gui/] which can be installed with homebrew.
Starting the Rubin Science Platform JupyterLab Notebook Aspect
Once the VPN connection is established, you should be able to navigate to the the JupyterLab instance at https://lsst-lsp-stable.ncsa.illinois.edu. Select the Release
and medium
options on the Spawner Options landing page, and then hit the “Spawn” button. You’ll (eventually) end up on the JupyterLab launcher, where you can use the file manager in the left hand side bar to open your Jupyter notebooks, or start terminal or notebook editor tabs from the buttons provided. You should see the pre-installed notebook-demo
notebooks in the file manager, for example.
It might take a long time to start the JupyterLab instance (a few minutes or so). We recommend using the most recent major release (e.g. v18.0.0) so that our semi-continuous integration script is able to run your notebook, and using “medium” size (to support image processing tasks).
At the end of your JupyterLab session, please make sure you save all and log out (from the launcher menu), to free up the cluster for others.
Running and Contributing to the Stack Club Notebooks
From the Launcher, start a terminal, cd
to the notebooks
folder and git clone
the StackClub
repo, using either HTTP or SSH access:
git clone https://github.com/LSSTScienceCollaborations/StackClub.git
(You’ll need to set up your SSH keys to use the SSH option, but this will enable you to avoid typing your GitHub password a lot.)
You can then git checkout
a development branch (so that you can keep your master
branch clean and up to date with the latest updates from the Club), and execute and modify the club notebooks. You can open them from the file manager, and use the resulting notebook editor.
New to
git
and GitHub? Have a play in this sandbox - from there you can watch Phil on YouTube doing a GitHub live demo, too.
Workflow
The Stack Club workflow is to edit the club notebooks (or start new ones) in a suitable development branch, push it to the base repo, and submit a pull request (to enable club code review). Club members have Write access and so can do this; everyone else can push to their fork of the StackClub repo, and submit a PR from there. To exercise this workflow, try modifying Hello_World.ipynb
, pushing your commit(s) and submitting a PR. Don’t forget to clear outputs and save before committing your changes!
Standards
We aspire to produce high-quality tutorials that can be followed by any member of the LSST Science Collaborations who wants to learn about the DM Stack, and in particular its science pipelines.
- We regularly test all the notebooks in the
master
branch of this repo using the most recent major release of the Stack, and flag those that do not run all the way through. Themaster
branch should only contain working notebooks, so that (ideally) Stack Club notebooks only fail to run if the Stack changes. - Maintenance of the Stack Club notebooks is the responsibility of the notebooks’ “owner(s)”, who are listed in the first cell of each notebook. This cell also lists the date and Stack release on which the notebook was last verified to run.
- The introduction cell of each notebook contains a list of “learning objectives”, so that the user can judge whether or not this tutorial is right for them.
- We include markdown cells to explain each step in the tutorial, and provide links to the source code and reference documents as needed.
A template notebook that will help you maintain the above standards is available in the templates folder.
Available Datasets
Broadly useful, small datasets are available in /project/shared/data
- this director is world readable, but is only writeable by members of the lsst-users
group (i.e., Rubin Project members). The stack club has its own read/writeable directory under /project/stack-club
- feel free to contribute public data there. You can also use your personal /project/<username>
folder for datasets that you want to share, but may not be as generally applicable. As a rule, Stack Club notebooks should use data in /project/shared/data
or /project/stack-club
. If you add a shared dataset, please document it in the README
of the associated directory.
Larger datasets are available in /datasets
. This is a read-only folder.
The Stack Club Library
The stackclub
folder in this repo is a python package containing a number of utility functions and classes for use in tutorial notebooks. You can browse its documentation at https://stackclub.readthedocs.io/.
If you are contributing notebooks, you may want or need to develop the stackclub
package as well
(e.g., by adding modules to it), and so its best to setup the package installation to be local and editable.
Start by opening a terminal in the RSP and sourcing the LSST setup:
source /opt/lsst/software/stack/loadLSST.bash
In the top level folder of your local clone of the StackClub repo, do:
python setup.py -q develop --user
This will put the repo’s stackclub
folder on your path. When developing the package, you may find it useful to add the following lines to your notebook:
%load_ext autoreload
%autoreload 2
This enables you to repeatedly import stackclub
as you update the library code. The above lines are in the template notebook, for your convenience.
If you are not developing this package, and you have permission to write to your base python site-packages, you can install it using pip, like this:
pip install git+git://github.com/LSSTScienceCollaborations/StackClub.git#egg=stackclub