Editor’s note: In this post, Tommy Zhang, a Ph.D. student working on tropical climate, shares about his experiences with using Python in a WRF modeling and analysis workflow.
In the atmospheric sciences, we already have many computational tools. If you want do numerical discretization, use a method from Numerical Recipes in Fortran or C. If you want to do some analysis, the library of NCL is super. If you want to slice-and-dice the data, GrADS‘s 4-D data structure will help you do the job.
We are not short of tools. Why should we add Python to our already long list?
Let me use my own workflow as a case study to answer this question. Recently, I needed to run the WRF model on a cluster. The process to do this involved: 1) downloading GCM data, 2) editing the configuration file, 4) running WPS to pre-process GCM data, 4) editing the configuration file, 5) running part of WRF to produce initial and boundary conditions, and 6) running another part of WRF as the simulation.
I could write some shell scripts to do the whole workflow, and it would work great; I can drink some coffee and give myself a break. However, ten days later, I analyse the result and find myself dissatisfied. Hoping to make some change to the setup and redo the work, I go back to the script. Although I know everything is contained in this long, long script and I need to do this, it’s painful to have to read it all again.
That’s one thing Python has saved me from. Python is more module-structured, and you can abstract the functions on several levels—function, module, class—and so it reads more logically. The strict format requirements, although peculiar on first glance, greatly increase the readability of the code. When reading Python code, it’s more like reading literature, and it is much easier to get the the idea of the code instead of struggling with the syntax.
In terms of functionality, whatever the shell script can do (e.g., execute system commands) can also be done using Python, but with greater cross-platform flexibility. First, while the Linux shell heavily relies on the utilities of Linux itself, and would presumably be cross-platform, nonetheless different computers may have different versions of these utilities. If I write a script using the Linux date command, and it runs successfully on one computer, it is still possible for another computer to give me a bug (and this really happened to me). Python integrates many utility packages and there is no need to use the OS utilities themselves. Second, with some versions of Python (I use EPD Python), it’s very easy to install the same Python on different computers and on different OS systems. Finally, even if you limit yourself to Linux only, there are several versions of shell (e.g., csh, sh, bash) which have different syntax, and some OS distributions do not have all the shells installed. If it’s your own personal PC, the fix is easy: go and download. If it’s a cluster, it’s not so easy: you have to persuade the possibly stubborn administrator. In short, Python is more cross-platform because it integrates most of the utility of shells within itself and it is very easy to maintain the same version of Python on all kinds of platforms and OSs.
So Python wins big in helping me make model runs. How about with the analysis of my results? If I try to run a long-period simulation, there will be a huge amount of output data. WRF provides some tools (e.g., Registry) to decrease the amount of outputs, but I want to do more, say some kind of average, interpolation, or integration. These requirements are usually done by some post-processing system such as NCL. But do I have to use NCL or can I use Python?
NCL has a better library for atmospheric science analysis than Python. However, as is written in the title of the NCL manual, NCL is its own “mini” language. Interaction with the post-processing system is very troublesome, and NCL’s string processing abilities are very basic. On the other hand, Python is good at managing files and has analysis tools, in the Scipy/Numpy library, though the scope of analysis tools is still not comparable to NCL. (I guess the Scipy community doesn’t receive enough support from atmospheric and climate science communities.)
So, based on my experience in utilizing Python in my workflow, I conclude that that I should use Python to organise my ideas, direct the structure of the modeling program, and interact with the system. For post-processing calculations, I can try to use Python but may need to have Python execute NCL to do specific calculations Python cannot do natively.
Python is easy to read, easy to maintain, and easy to use. It’s powerful and could replace shell completely. With Scipy/NumPy/matplotlib, Python can be used directly in scientific research. My only regret is that Python doesn’t have all the analysis tools for the atmospheric sciences. It would be great if the community could create these tools. However, what I want to emphasize here is that, rather than being a single tool, Python is more like a platform. Python enables you to use the computer in a high-level and low-level way, and since it’s so easy to read and write Python code, you can use it to do all computer-related stuff in a more concise and smarter way. The end result: higher productivity.