How to use Purdue AF =========================== Basic interface components --------------------------- JupyterLab provides an interactive interface for general code development. The screenshot below shows the main elements of the interface: .. image:: images/interface-new.png :width: 900 :align: center #. **File browser** - your home directory with symlinks to different storage volumes (Depot, CVMFS, ``/work/``, etc. - learn more :doc:`here `). #. **Exstensions** - left sidebar contains useful extesions, such as a Git extension for interactive work with GitHub or GitLab repositories. #. **Launcher** - features buttons to create Python and ROOT C++ notebooks with different Pixi or Conda environments, open terminals, create new text files, etc. New launcher window can be opened by clicking the ``+`` button in the file browser or next to any open tab. #. **Top bar** - contains Purdue AF release version, your username, dark theme switch, and the shutdown button. #. **Terminal** - standard Bash terminal, useful for any cases that require a command line interface. #. **File editor** - simple IDE with syntax highlight for most common programming languages. .. note:: Windows with terminals, editors, etc., can be rearranged. The window layout is preserved when you shut down and restart the AF session. Other User Interfaces --------------------- In addition to JupyterLab, Purdue AF provides other user interfaces for analysis development. .. |vscode-logo| image:: images/vscode-logo.jpeg :height: 20px * **Web-based Visual Studio Code (code-server)** interface To open the web-based VSCode interface, click on the button with VSCode logo |vscode-logo| in the JupyterLab Launcher (the button is not shown in the screenshot above). * :doc:`Connection from local VSCode-based IDEs ` * :doc:`SSH connection from local terminal ` Python code development ------------------------ JupyterLab is especially well suited for developing analysis workflows in Python. * **Jupyter Notebooks** allow to write analysis code as a sequence of code and text cells, which can be executed in arbitrary order. In many cases, a single Jupyter Notebook can accomodate a full analysis from data access to producing final plots. Jupyter Notebooks support a wide range of plugins and widgets, which allows for a more interactive experience comparing to simple Python scripts. * To execute the code in a Jupyter Notebook, we always need to specify a **kernel**. At Purdue AF, Jupyter kernels are derived from Pixi or Conda environments. Read more :doc:`here `. * We provide a :doc:`curated "global" Pixi environment `, which should work for most applications, unless your code relies on a very specific package version. * Analysis code written in Python can be accelerated via parallelization. We recommend using :doc:`Dask ` for parallelization and distributed computing. For scaling out to multiple computing nodes, consider using :doc:`Dask Gateway `. ROOT ------- `ROOT `_ is a software package developed by CERN and widely used in high energy physics for histogramming, fitting, and statistical analysis. * ROOT console can be launched from a terminal by typing ``root -l``. Note that it is not possible to display canvases or open ``TBrowser`` as JupyterLab interface does not support X11 forwarding. * Alternatively, you can turn a Jupyter Notebook into a ROOT console by selecting the **ROOT C++ kernel**. Similarly to Python notebooks, you can add text cells and execute cells in arbitrary order. When working from a Jupyter Notebook, you can display ROOT plots using ``TCanvas::Draw`` method. :doc:`See example of ROOT C++ notebook here `. * The pre-installed ROOT C++ kernel supports **CUDA backend** for RooFit. To use it, pass ``RooFit::EvalBackend("cuda")`` argument to ``model.fitTo()``. * In Python, ROOT functionality is accessiblae via `PyROOT `_ package, present in the default kernel. HEP analysis frameworks ------------------------- We aim to support a wide range of modern HEP analysis tools. Below are a few examples of frameworks which have been shown to perform well at Purdue AF: * `Coffea `_ is a popular Python package for efficient columnar particle physics analyses. Coffea implements all common tools used in modern HEP analyses, and has a large and active support community. The latest version of Coffea is pre-installed in the global Pixi environment at ``/work/pixi/global/``. * `PocketCoffea `_ is a slim declarative framework built on top of Coffea. It allows to define an analysis with a few configuration files. A PocketCoffea analysis can be executed in a distributed way using `dask@purdue-af executor `_ which is based on :doc:`Dask Gateway `. * `RDataFrame `_ is another common HEP analysis framework based on ROOT. RDataFrame analysis can be written in either C++ or Python. Purdue AF supports RDataFrame in any Pixi or Conda environment where ROOT is installed. Scaling out ------------ * `Slurm `_ is a job scheduler and workload manager that enables batch submission on Purdue computing clusters. At Purdue AF, **users with local Purdue accounts** can submit jobs from Purdue AF to ``cms`` queue. `Instructions for submitting Slurm jobs `_ * `Dask `_ is an open-source library for parallel computing in Python. It can be used to :doc:`quickly parallelize any Python code `, or implicitly as a backend in frameworks such as Coffea and RDataFrame. At Purdue AF, we host :doc:`Dask Gateway servers `, which allow **users with both local and external (CERN/FNAL) accounts** to scale out beyond local session resources. * `CRAB `_ (CMS Remote Analysis Builder) is a utility to submit CMSSW jobs to distributed computing resources. CRAB allows users to: * Access Data and Monte Carlo datasets stored at any CMS computing site worldwide. * Exploit the CPU and storage resources at CMS computing sites via Worldwide LHC Computing Grid (WLCG). CRAB is suitable for running most CMSSW framework jobs (i.e. jobs launched via the ``cmsRun`` command). It is recommended to use CRAB for computationally intensive jobs, such as Monte Carlo generation or "skimming" AOD / MiniAOD datasets. `Instructions for submitting CRAB jobs `_ GPUs ------ At Purdue AF, you can start a session with a GPU by specifying it at resource selection step. We have a limited number of Nvidia A100 GPUs, which are available in two configurations: +----------------------+--------+---------------------+ | Configuration | Memory | Number of instances | +----------------------+--------+---------------------+ +----------------------+--------+---------------------+ | Full A100 GPU | 40GB | 4 | +----------------------+--------+---------------------+ | 5GB "slice" of A100 | 5GB | 14 | +----------------------+--------+---------------------+ See more info here: :doc:`GPU access at Purdue AF `.