Laptop setup
You will work on our computing cluster, but before we get there, we must get you properly set up on your machine.
Install Python
If you have not done so already, you should install a distribution of Python called Anaconda. Anaconda is not only an easy way of installing Python on Windows, Mac, and Linux; it also comes with the conda package management system (more about that later). To install Anaconda, visit this page. When the download completes, you must follow the default installation.
The Terminal
The programs you will use in the project are command-line applications. I.e., programs executed by writing their name and any arguments in a “terminal” rather than clicking on an icon and using a graphical user interface. Many different programs can serve as a terminal.
- If you have a Windows machine, use the Anaconda Powershell Prompt (not the Anaconda Prompt and not the
CMD
). You installed Anaconda Powershell Prompt along with Anaconda Python. - If you have a Mac, the terminal you will use is called Terminal. The Terminal application is pre-installed on Mac.
From now on, whenever I refer to the terminal, I mean Anaconda Powershell Prompt on Windows and Terminal on Mac. I will assume some familiarity with using a terminal and executing commands on the command line. If you have not used a terminal before, or if you are a bit rusty, you should run through this primer before you go on.
Conda environments
You must install packages and programs for your analyses and pipelines. Sometimes, however, the packages you need for one project conflict with the ones you need for other projects you work on in parallel. Such conflicts seem like an unsolvable problem. Would it not be fantastic if you could create a small world insulated from the rest of your Anaconda installation? Then, that small world would only contain the packages you needed for a single project. If each project had its isolated world, then there would be no such conflicts.
Fortunately, a tool lets you do just that, and its name is Conda. Conda’s small worlds are called “environments,” and you can create as many as you like. You can then switch between them as you switch between your bioinformatics projects. Conda also downloads and installs the packages for you, ensuring that the packages you install in each environment are compatible. It even makes sure to install packages (dependencies) required by the packages you install. By creating an environment for each project, the libraries installed for each project do not interfere.
Create a conda environment on your local machine
When you install Anaconda or Miniconda, Conda makes a single base environment. It is called “base,” and this is why it says “(base)” at your terminal prompt. You need a conda environment for your project on your local machine. You can call it anything you like, but if you call it birc-project
, it will match the rest of this tutorial. So do that.
The environment on your local machine only needs a few packages since it only connects you to the cluster. The command below creates a birc-project
environment and installs the slurm-jupyter
package from my conda channel:
Terminal
conda create -y -n birc-project -c kaspermunch slurm-jupyter
To activate the environment, use this command:
Terminal
conda activate birc-project
Note how the environment name at your terminal prompt changes from (base)
to (birc-project)
. To deactivate the environment, use this command:
Terminal
conda deactivate
and the terminal prompt changes back to (base)
.
VPN
You can only connect to the cluster on the University’s internal network. So you must be physically on campus or use the University’s VPN. To install VPN, follow the instructions on this page. Before you can use the VPN, you also need to enable two-step verification. You can see how to do that on the same page. If you are not physically on campus, you must activate your VPN before logging in to the cluster. Your password for the VPN is the same as the one you use to log on to access Blackboard.