The Future of AOS Python: Preparing AOS Students for Tomorrow’s Computational Challenges

By Daniel Rothenberg (Postdoctoral Associate, Center for Global Change Science, MIT)
@danrothenberg

Over the last six years, I served the American Meteorological Society as a member and co-chair of its Student Conference Planning Committee. Each year, just a few weeks after the Annual Meeting, we’d start the long and difficult process of crafting a valuable Conference experience for both new and veteran participants alike. But despite our attendees’ diverse interests, some topics always attracted a broad swath of interest. Chief among those was the application of modern computing tools, techniques, and technologies to today’s (and tomorrow’s) tough problems.

In too many ways to list, progress in the atmospheric and oceanic sciences (AOS) is greatly benefiting from advances in computational science. But these advances are bringing to light new challenges which our community is only slowly acknowledging, let alone embracing and solving [1]. Unfortunately, we’re not equipping our students and early-career scientists with the skills necessary to confront these challenges. And one of the lessons I learned from chairing the Student Conference is that students are both aware of this fact and frustrated by it.

The PyAOS community is uniquely positioned to provide effective solutions to and leadership on this problem, due to the broader open source and data science ecosystems with which we interface. On the one hand, the PyData community (with support from NumFOCUS) is producing tools which greatly simplify tackling the sorts of “big data” challenges that are already prevalent in AOS. We should work both expose young scientists to these tools2 and how they’re useful, and encourage them to contribute back to the open source movement. More importantly, the philosophy of open and reproducible science is a driving motivation for the PyData community, and stands to transform the way we undertake research. PyAOS members should position themselves as the “experts in the room” on these topics, and advocate for them to students and colleagues at our home institutions.

Although it’s crucial that the PyAOS group focuses on solving the real computational challenges of the near future, we must also help train, educate, and prepare the broader AOS community to deal with them, too. What’s needed to accomplish this is stronger communication and collaboration amongst PyAOS members, especially when it comes to developing training materials. For instance, Damien Irving has created a Software Carpentry module which features a “practical” application that oceanographers might appreciate. PyAOS pulls from a diverse set of AOS researchers – why not follow Damien’s example and compile additional such materials with prototypical examples from all of our fields? Creating lessons which highlight reproducible science also provides the opportunity to showcase the new tools that we’re developing (or are being contributed from outside the field); this could in turn lead to faster adoption and more contributions from users, as well as important feedback on where friction points exist in AOS research workflows, which we could then work to improve.

But let’s also remember that today’s Student Conference attendees will one day be the leaders of the AOS community. These students understand that computational proficiency will play a critical role in their career pursuits, and are eager to learn about these skills. They’ll soon be the graduate students, post-docs, and innovators investing the most time and effort in helping to build all of the tools we anticipate will be necessary to solve tomorrow’s computational challenges. Improving the outlook for the future of the PyAOS community hinges on finding ways to better serve our students and early-career scientists, just as much as it depends on building technical solutions to hard problems.

[1] This will become very clear in the climate science community when researchers turn their attention to the huge amount of data anticipated with the CMIP6 model output!
[2] These tools include cluster-computing platforms like Spark or dask, pipeline tools like Snakemake and luigi, and libraries such as pandas which enable a “grammar of data science“.

This entry was posted in Community, Teaching. Bookmark the permalink.