I wasted an hour or two today because I didn't realize that Python imports are relative, not absolute, and that they are relative to two things: the module's own location, and the main script's location. This is horrible, because it is very easy to accidentally have a name that conflicts with a built-in module, and then it becomes difficult to import that module reliably. To explain how this burned me and what is actually happening, I'll walk you through an example. Let's say we have a Python program called script.py
, and two modules in a package: mypackage.mymodule
and mypackage.email
. The directory structure looks like:
- mypackage/
- __init__.py
- email.py
- mymodule.py
- script.py
(You can download the sample code if you like)
Now script.py
is one line: import mypackage.email
, and the email
module prints a message when imported. If we run script.py
Python finds mypackage/email.py
as we expect:
$ ./script.py mypackage.email imported
Great! Now mymodule.py
also imports mypackage.email
, and we modify script.py
to import this module as well:
$ ./script.py mypackage.email imported mymodule: mypackage.email = <module 'mypackage.email' from '.../mypackage/email.pyc'>
We start working on our program, and at some point we realize that inside mymodule
, we want to call Python's built-in email.utils.parseaddr() function. So we add one line, import email.utils
, and then we get:
$ ./script.py mypackage.email imported Traceback (most recent call last): File "./script.py", line 4, in <module> import mypackage.mymodule File ".../mypackage/mymodule.py", line 8, in <module> import email.utils ImportError: No module named utils
Python can't find email.utils
, except it is a built-in module! Why not?
The problem here is that imports are by default relative to the module (In Python versions <= 2.7.3; see below). Thus, mymodule
first searches in its own package, and finds our own email.py
instead of the built-in module. If we change the script to just import email
, we get:
$ python ./script.py mypackage.email imported mymodule: mypackage.email = <module 'mypackage.email' from '.../mypackage/email.pyc'> mymodule: email = <module 'mypackage.email' from '.../mypackage/email.pyc'>
So how do we get the built-in module? We need to tell Python that we want absolute imports. This is the default in Python3, and it may become the default for future versions of Python2.x as well. To do this, we need to use: from __future__ import absolute_import
. If we add that line, we now get the following output:
$ ./script.py mypackage.email imported mymodule imported; mypackage.email = <module 'mypackage.email' from '.../mypackage/email.py'> mymodule: email = <module 'email' from '/System/.../python2.7/email/__init__.pyc'>
Victory! We now can access both modules, one as mymodule.email
and the other as email
. If we explicitly want a relative import, we can use from . import email as local_email
, and then you get:
$ ./script.py mypackage.email imported mymodule: mypackage.email = <module 'mypackage.email' from '.../mypackage/email.pyc'> mymodule: email = <module 'email' from '/System/.../python2.7/email/__init__.pyc'> mymodule: local_email = <module 'mypackage.email' from '.../mypackage/email.pyc'>
Now let's write a unit test for mypackage.mymodule
. We create a file called mypackage/mymodule_test.py
that imports mympackage.mymodule
:
$ ./mypackage/mymodule_test.py Traceback (most recent call last): File "./mypackage/mymodule_test.py", line 4, in <module> import mypackage.mymodule ImportError: No module named mypackage.mymodule
Ah right, we need to set our PYTHONPATH
so it can find the module. Let's try again:
$ PYTHONPATH=. ./mypackage/mymodule_test.py mypackage.email imported mypackage.email imported mymodule: mypackage.email = <module 'mypackage.email' from '.../mypackage/email.pyc'> mymodule: email = <module 'email' from '.../mypackage/email.pyc'> mymodule: local_email = <module 'mypackage.email' from '.../mypackage/email.pyc'>
Wait a second, look at that closely: In mymodule
, it found our own email.py
module for both import email
and import mypackage.email
, even though we are specifying that we want absolute imports. Didn't we just fix this problem? Why isn't it still fixed?
The problem now is that Python puts the script's directory at the beginning of the module search path, sys.path
(or PYTHONPATH
). In Python, the main script is assumed to be at the root of the package tree. Doing anything differently, like trying to put the mymodule_test
script inside mypackage
, breaks things. The first warning sign here is that we needed to specify our own PYTHONPATH
. However, even i
There are two "easy" but unsatisfying solutions: either put all the main Python scripts in the actual root of your package hierarchy, or rename files to avoid name clashes with built-in modules.
I have however found a disgusting hack that fixes this problem: Modify the first entry in sys.path
. The easy solution is to just remove it (del sys.path[0]
). This will require that you manually specify the correct PYTHONPATH
. A more complex but "perfect" solution is to modify sys.path[0]
to reflect the script's desired location in the package hierarchy, which looks like the following:
if __name__ == "__main__": import os import sys scriptdir = os.path.abspath(os.path.dirname(sys.argv[0])) # Check that this Python version does what we expect assert sys.path[0] == scriptdir # package root is up one level in the heirarchy sys.path[0] = os.path.normpath(os.path.join(scriptdir, "..")) import mypackage.mymodule
This script now does what we expect, no matter how we invoke it:
$ ./mypackage/mymodule_test.py mypackage.email imported mymodule: mypackage.email = <module 'mypackage.email' from '/Users/ej/example/mypackage/email.pyc'> mymodule: email = <module 'email' from '/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/email/__init__.pyc'> mymodule: local_email = <module 'mypackage.email' from '/Users/ej/example/mypackage/email.pyc'>
Unfortunately, this is a lot of crap to include in the header of a script, but it could be made into its own module. For now, I just use the del sys.path[0]
hack in my programs, and always explicitly specify the right PYTHONPATH
.
By default, Python paths are relative to the main script that is being executed. On Python 2.7 and older, imports are also relative to the module doing the importing. This means if you get weird import errors, check for name clashes.
The higher-level lesson here is that absolute imports are easier to understand. They always do the same thing in all programs, and don't depend on things that can change like the file's location, or the location of the main program. In this respect, Java probably got this right.