Getting Started

By Damien Irving.

If you’re new to using Python in the atmospheric and ocean sciences, here’s a 5 step guide to get you started:

Step 1: Setup your Python environment

As you might expect, the default approach to installing Python on your computer (if it doesn’t already come with it) is to simply download and run the installer from python.org/downloads. This will install what is known as the Python standard library – the few hundred modules that together perform the core functions of the programming language. These modules are very important (e.g. without them Python wouldn’t know how to interact with the operating system or the Internet), but unless you’re a software developer you probably don’t need to know that much about them. Instead, what research scientists are usually interested in are the various Python packages that software developers have written (using the standard library) for doing things like data visualization, statistics and reading/writing netCDF files. The authors of such packages will typically make them available via the Python Package Index, so that people can then install them using a command line function called pip (which comes with the standard library).

While this all sounds pretty simple, the task of identifying all the packages you need and then installing them in such a way that they interact nicely together can be very difficult. The problem arises because many packages have dependencies – other modules and libraries that they depend on to function properly. Sometimes it is possible to simply install the dependencies using pip, but what happens if you want to use two different packages that each depend on a different version of the same library? And what happens if some of the dependencies aren’t available via pip?

Recognizing this problem, a number of Python distributions have been released that come with more than just the standard library. The most widely used scientific distributions are Canopy and Anaconda, which come with 300 or so of the most popular packages for data analysis and visualization already pre-installed. They also come with a number of development environments (e.g. Anaconda comes with IPython QtConsole, IPython Notebook and Spyder), so you can choose whichever you like best. These distributions are a great alternative to the standard library-only installation from python.org/downloads, particularly when you’re just starting out with Python.

When installing a Python distribution, you might be asked whether you’d like to install Python version 2 or 3. It’s preferable to chose Python 3 if you can, but this decision will ultimately depend on the packages you want to use for your work. Many existing data analysis packages are not yet Python 3-compliant, which means many people in the PyAOS community still use Python 2.

See the packages tab for more information on Python distributions and development environments.

Step 2: Learn the basics of Python programming

From online resources to in-person workshops, the training tab has got you covered for learning the basics.

Step 3: Familiarize yourself with the core Python libraries used in the atmospheric and ocean sciences

The default Python library for analyzing large arrays of numeric data (e.g. four dimensional latitude/longitude/altitude/time data arrays) is numpy. The numpy n-dimensional array (or ndarray for short) knows how to calculate its own statistics (e.g. mean, standard deviation), is easy to slice/subset and the numpy library contains an extensive collection of functions for performing almost any array manipulation you could think of.  The default library for visualizing ndarrays is matplotlib, while is the default library for reading and writing ndarrays to and from netCDF files is netCDF4 (read/write to text files is built into numpy).

While most basic analysis and visualization tasks in the atmospheric and ocean sciences could be achieved with a combination of numpy, matplotlib and netCDF4, all three are generic, all-purpose libraries. This means it usually takes a fair bit of wrangling to get them to do tasks that are common to the atmospheric and ocean sciences. What’s more, most people in the PyAOS community end up doing similar wrangling, so there’s a lot of duplication of effort. Recognizing this as a problem, teams of software developers (usually employed by large AOS organizations) have built on top of numpy, matplotlib and/or netCDF4 to create alternative all-purpose libraries that are specifically designed for the PyAOS community. The three most widely used all-purpose libraries are xray, iris and cdat. Each have their pros and cons (e.g. iris is great for plotting, while xray is built on top of a very popular generic data analysis library called pandas) and you might find yourself switching between them depending on the task at hand.

Step 4: Find the specialized libraries you need

Now you’ve got your head around the core, all-purpose Python libraries used for data processing and visualization, you’ll want to hunt around and see if there are any libraries out there for the highly specialized aspects of your work. For instance, there are libraries available for dealing with radar data (Py-ART), analyzing and plotting skew-T diagrams (SkewT), performing computations on global wind fields in spherical geometry (windspharm) and so on. A listing of most of the packages out there can be found at the packages tab (please let us know if there are any missing!).

Step 5: Sign up to the PyAOS mailing list

The PyAOS mailing list (sign up at the mailing lists tab) is the place to keep up to date with the latest Python developments relevant to the atmospheric and ocean sciences.