Several weeks ago, I wrote about how dictionaries can help make your data analysis scripts simpler. As part of that post, I showed a script that looped through a list of filenames I had typed out, and noted that if we could find a way to get a list of files in a directory through an operating system listing, the script would be even more powerful and flexible. In this post, I will describe two such ways of getting a list of files. Both methods will work on any platform.
The first way uses the
os module, along with the methods that come built-in with all string objects. The
os module presents a cross-platform way of executing operating system commands. It has a module function
listdir that returns a list of files in the directory path given in its single positional input parameter. Thus:
list_of_files = os.listdir("/home/jlin")
will give you a list of all the files in the directory
/home/jlin. Note the above is not a platform-independent way of specifying the path; a better way would be to use functions in the
os.path submodule, but that discussion is for another time.
What if that directory contains both data files as well as other files (e.g., readme files, etc.)? Can you pare down the list of files so that only data files are kept in the list? You can, and one way of doing so is to use string methods, since each item in the list of files returned by
os.listdir is a string.
Assume all the data files are in netCDF format and end with the suffix
.nc. Strings have a method
endswith that returns
True if the string ends with that substring. So, we can loop through our list of files, test to see if the filename ends with
.nc, and if so, append the filename to a list of only those files. The code would be:
nc_only_files = 
for ifile in list_of_files:
nc_only_files are in any kind of order, but we now have a list of the files we want!
The second way I’ll show to get a list of files is even easier than using
os plus string methods. Here we make use of the
glob module, which has functions that return directory listings in accordance with patterns specified by Unix-like wildcards. Thus, to get the list
nc_only_files for the current directory, we only need to type in the following:
nc_only_files = glob.glob("*.nc")
Pretty easy, huh? 🙂 And all modules described in this post are built-in to Python, so you don’t have to install additional modules.
Hat tip: Thanks to “N eil” for the tip on using