Getting a List of Files

Several weeks ago, I wrote about how dictionaries can help make your data analysis scripts simpler. As part of that post, I showed a script that looped through a list of filenames I had typed out, and noted that if we could find a way to get a list of files in a directory through an operating system listing, the script would be even more powerful and flexible. In this post, I will describe two such ways of getting a list of files. Both methods will work on any platform.

The first way uses the os module, along with the methods that come built-in with all string objects. The os module presents a cross-platform way of executing operating system commands. It has a module function listdir that returns a list of files in the directory path given in its single positional input parameter. Thus:

import os
list_of_files = os.listdir("/home/jlin")

will give you a list of all the files in the directory /home/jlin. Note the above is not a platform-independent way of specifying the path; a better way would be to use functions in the os.path submodule, but that discussion is for another time.

What if that directory contains both data files as well as other files (e.g., readme files, etc.)? Can you pare down the list of files so that only data files are kept in the list? You can, and one way of doing so is to use string methods, since each item in the list of files returned by os.listdir is a string.

Assume all the data files are in netCDF format and end with the suffix .nc. Strings have a method endswith that returns True if the string ends with that substring. So, we can loop through our list of files, test to see if the filename ends with .nc, and if so, append the filename to a list of only those files. The code would be:

nc_only_files = []
for ifile in list_of_files:
    if ifile.endswith(".nc"):
        nc_only_files.append(ifile)

Neither list_of_files nor nc_only_files are in any kind of order, but we now have a list of the files we want!

The second way I’ll show to get a list of files is even easier than using os plus string methods. Here we make use of the glob module, which has functions that return directory listings in accordance with patterns specified by Unix-like wildcards. Thus, to get the list nc_only_files for the current directory, we only need to type in the following:

import glob
nc_only_files = glob.glob("*.nc")

Pretty easy, huh? 🙂 And all modules described in this post are built-in to Python, so you don’t have to install additional modules.

Hat tip: Thanks to “N eil” for the tip on using glob!

This entry was posted in Beginner, Data Analysis, Featured Tips, Tutorials. Bookmark the permalink.
  • Neil (not N_eil)

    Another way to do this is with fnmatch:

    import os, fnmatch
    list_of_files = os.listdir(‘/home/jlin/’)
    nc_files = fnmatch.filter(list_of_files,’*.nc’)

  • http://www.johnny-lin.com Johnny Lin

    Cool! Thanks Neil!