Parametrizing Python Tests

This post covers testing multiple test cases at once using @pytest.mark.parametrize(). In Writing Simple Tests we tested the validate_url() function that validates the links that users add to their profiles. util.py also contains a validate_orcid() function for validating the ORCID ID’s that users add to their profiles (an ORCID is a unique identifier for a scientific author):

def validate_orcid(orcid):
    """
    Validate an ORCID.

    Verify that an ORCID conforms to the structure described at
    http://support.orcid.org/knowledgebase/articles/116780-structure-of-the-orcid-identifier

    Returns the normalized ORCID if successfully parsed or raises a ValueError
    otherwise.
    """
    ...

We want to test that validation succeeds (returning the string unmodified, without raising an exception) for a variety of valid ORCIDs. We could test this by writing a separate test method for each valid ORCID that we want to test, but that would be a lot of duplication - each test method would be exactly the same except for the ORCID used. Another way we could do this is with a for loop in a single test method:

def test_validate_orcid_accepts_valid_ids():
    for orcid_id in ['0000-0002-1825-0097', '0000-0001-5109-3700', '0000-0002-1694-233X']:,
        assert validate_orcid(orcid_id) == orcid_id

This is better - there’s no duplication in the test code. You could say that this breaks the arrange, act, assert pattern because it calls the method under test three times, not once. But there is only one line of code that calls the method under test, it just happens to be a line inside a loop, so I think it’s okay. This approach does have the disadvantage that it clutters up the body of our test method with a for loop. If the function we were testing had more than one parameter and we wanted to test it with different combinations of multiple parameters then the loop and the clutter would be more complicated.

Pytest gives us a better way to write this kind of test - @pytest.mark.parametrize():

@pytest.mark.parametrize('orcid', [
    '0000-0002-1825-0097',
    '0000-0001-5109-3700',
    '0000-0002-1694-233X',
])
def test_validate_orcid_accepts_valid_ids(orcid):
    assert validate_orcid(orcid) == orcid

The parametrize() causes pytest to run this function three times, each time passing the next one of the ORCIDs as the orcid argument to the test.

The first argument to parametrize(), called the argnames argument, is the string 'orcid':

@pytest.mark.parametrize('orcid', [
    '0000-0002-1825-0097',
    '0000-0001-5109-3700',
    '0000-0002-1694-233X',

This tells pytest the name of the test parameter that we’re going to be parametrizing. Pytest finds the test method parameter with the matching name:

@pytest.mark.parametrize('orcid', [
    '0000-0002-1825-0097',
    '0000-0001-5109-3700',
    '0000-0002-1694-233X',
])
def test_validate_orcid_accepts_valid_ids(orcid):
    assert validate_orcid(orcid) == orcid

The second argument to parametrize() is the argvalues argument, this is the list of values that pytest will pass to the test function’s orcid parameter:

@pytest.mark.parametrize('orcid', [
    '0000-0002-1825-0097',
    '0000-0001-5109-3700',
    '0000-0002-1694-233X',
])
def test_validate_orcid_accepts_valid_ids(orcid):
    assert validate_orcid(orcid) == orcid

Pytest takes the first argument from this list and calls test_validate_orcid_accepts_valid_ids('0000-0002-1825-0097'), then it takes the second argument and calls test_validate_orcid_accepts_valid_ids('0000-0001-5109-3700'), and so on.

So essentially we’ve got three separate tests in one. This separates test data (in this case, the different ORCIDs being tested) from test logic (the code in the body of the test method), and avoids cluttering up the method body with a for loop.

parametrize() with multiple parameters

In the example above we parametrized just a single parameter - orcid. Where parametrize() really comes into its own is when you have a test function with multiple parameters. You can reduce what might have been many separate tests down to just one.

As an example, let’s look at the tests for the update_web_uri() method. (Here’s the real tests for this method.)

When someone annotates a document using Hypothesis we collect many URIs for that document. The URL from the browser’s location bar is one URI (which we call a “self-claim” URI). Some web pages contain canonical links in the HTML, like:

<link rel="canonical" href="http://example.com/wordpress/seo-plugin/">

We collect these and call them “rel-canonical” URIs for the document. There can also be rel="alternate" and rel="shortlink" URIs in an HTML document. And there are many other kinds of document URIs that we collect as well.

When we want to display the domain name of a document (e.g. www.example.com) on a Hypothesis page, or render a link to a document, we need to consider all of the different URIs for that document that we’ve collected into our database and select the “best” one to use.

We call this best URI the document’s web_uri, and the update_web_uri() method is responsible for updating it whenever we receive new URIs for a document:

def update_web_uri(self):
    """
    Update the value of the self.web_uri field.

    Set self.web_uri to the "best" http(s) URL from self.document_uris.

    Set self.web_uri to None if there's no http(s) DocumentURIs.

    """

We want to test the behaviour of update_web_uri() with many different combinations of URIs that a document might have - when we have just one URI for a document it should just choose that one URI, for different possible combinations of multiple types of URI it should choose the one that we think is best in each case. Here’s what the tests for update_web_uri() could look like:

class TestDocumentWebURI(object):

    def test_given_a_single_http_or_https_uri_it_returns_it(self, factories):
        document = factories.Document()
        uri = 'http://example.com'
        factories.DocumentURI(uri=uri, type='self-claim', document=document)

        document.update_web_uri()

        assert document.web_uri == uri

    def test_if_there_are_no_http_or_https_uris_it_returns_None(self, factories):
        document = factories.Document()
        factories.DocumentURI(uri='ftp://example.com', type='self-claim',
                              document=document)
        factories.DocumentURI(uri='android-app://example.com',
                              type='rel-canonical', document=document)
        factories.DocumentURI(uri='urn:x-pdf:example',
                              type='rel-alternate', document=document)
        factories.DocumentURI(uri='doi:http://example.com',
                              type='rel-shortlink', document=document)

        document.update_web_uri()

        assert document.web_uri is None

    ...

… and so on, there were several more tests for the expected results given different combinations of URIs. All of these tests have the same test logic:

Create a Document with some DocumentURIs
Call update_web_uri()
Check that web_uri is equal to the expected URI

They differ only in the list of DocumentURIs that’s created and in the value that we expect web_uri to have at the end. With parametrize() we can combine all of these into a single test.

Here’s what the beginning of a parametrize() call for this test might look like:

@pytest.mark.parametrize('document_uris,expected_web_uri', [
    # Given a single http or https URL it just uses it.
    ([('http://example.com',  'self-claim')],    'http://example.com'),
    ([('https://example.com', 'self-claim')],    'https://example.com'),
    ...
])
def test_update_web_uri(self, document_uris, factories, expected_web_uri):
    document = factories.Document()

    for docuri_tuple in document_uris:
        factories.DocumentURI(uri=docuri_tuple[0], type=docuri_tuple[1],
                              document=document)

    document.update_web_uri()

    assert document.web_uri == expected_web_uri

Here we’re parametrizing two arguments to the test method, not just one as before. The argnames argument to parametrize() gives two argument names separated by a comma:

@pytest.mark.parametrize('document_uris,expected_web_uri', [
    # Given a single http or https URL it just uses it.
    ([('http://example.com',  'self-claim')],    'http://example.com'),
    ([('https://example.com', 'self-claim')],    'https://example.com'),
    ...
])
def test_update_web_uri(self, document_uris, factories, expected_web_uri):
    ...

Notice that the test function also has a third argument, factories. This is the same factories for creating test objects that we saw earlier. Having this argument in there doesn’t do any harm - it doesn’t confuse parametrize. And it doesn’t even matter that the factories argument appears in-between document_uris and expected_web_uri - the order of the parameters doesn’t matter, pytest looks at their names to figure out what they are.

Since we’re now parametrizing two parameters instead of one, the argvalues argument to parametrize() becomes a list of two-tuples:

@pytest.mark.parametrize('document_uris,expected_web_uri', [
    # Given a single http or https URL it just uses it.
    ([('http://example.com',  'self-claim')],    'http://example.com'),
    ([('https://example.com', 'self-claim')],    'https://example.com'),
    ...
])
def test_update_web_uri(self, document_uris, factories, expected_web_uri):
    ...

Pytest will first call the test function passing the values from the first two-tuple as arguments:

@pytest.mark.parametrize('document_uris,expected_web_uri', [
    # Given a single http or https URL it just uses it.
    ([('http://example.com',  'self-claim')],    'http://example.com'),
    ([('https://example.com', 'self-claim')],    'https://example.com'),

it’ll call:

test_update_web_uri([('http://example.com',  'self-claim')], factories, 'http://example.com'):

It’ll then take the second two-tuple:

@pytest.mark.parametrize('document_uris,expected_web_uri', [
    # Given a single http or https URL it just uses it.
    ([('http://example.com',  'self-claim')],    'http://example.com'),
    ([('https://example.com', 'self-claim')],    'https://example.com'),

and run the test with those arguments:

test_update_web_uri([('https://example.com',  'self-claim')], factories, 'https://example.com'):

and so on.

Here’s the full, final version of the test with all parametrized cases:

@pytest.mark.parametrize('document_uris,expected_web_uri', [
    # Given a single http or https URL it just uses it.
    ([('http://example.com',  'self-claim')],    'http://example.com'),
    ([('https://example.com', 'self-claim')],    'https://example.com'),
    ([('http://example.com',  'rel-canonical')], 'http://example.com'),
    ([('https://example.com', 'rel-canonical')], 'https://example.com'),
    ([('http://example.com',  'rel-shortlink')], 'http://example.com'),
    ([('https://example.com', 'rel-shortlink')], 'https://example.com'),

    # Given no http or https URLs it sets web_uri to None.
    ([], None),
    ([
        ('ftp://example.com',              'self-claim'),
        ('android-app://example.com',      'rel-canonical'),
        ('urn:x-pdf:example',              'rel-alternate'),
        ('doi:http://example.com',         'rel-shortlink'),
     ], None),

    # It prefers self-claim URLs over all other URLs.
    ([
        ('https://example.com/shortlink',  'rel-shortlink'),
        ('https://example.com/canonical',  'rel-canonical'),
        ('https://example.com/self-claim', 'self-claim'),
     ], 'https://example.com/self-claim'),

    # It prefers canonical URLs over all other non-self-claim URLs.
    ([
        ('https://example.com/shortlink',  'rel-shortlink'),
        ('https://example.com/canonical',  'rel-canonical'),
     ], 'https://example.com/canonical'),


    # If there's no self-claim or canonical URL it will return an https
    # URL of a different type.
    ([
        ('ftp://example.com',              'self-claim'),
        ('urn:x-pdf:example',              'rel-alternate'),

        # This is the one that should be returned.
        ('https://example.com/alternate',  'rel-alternate'),

        ('android-app://example.com',      'rel-canonical'),
        ('doi:http://example.com',         'rel-shortlink'),
     ], 'https://example.com/alternate'),

    # If there's no self-claim or canonical URL it will return an http
    # URL of a different type.
    ([
        ('ftp://example.com',              'self-claim'),
        ('urn:x-pdf:example',              'rel-alternate'),

        # This is the one that should be returned.
        ('http://example.com/alternate',   'rel-alternate'),

        ('android-app://example.com',      'rel-canonical'),
        ('doi:http://example.com',         'rel-shortlink'),
     ], 'http://example.com/alternate'),
])
def test_update_web_uri(self, document_uris, factories, expected_web_uri):
    document = factories.Document()

    for docuri_tuple in document_uris:
        factories.DocumentURI(uri=docuri_tuple[0], type=docuri_tuple[1],
                              document=document)

    document.update_web_uri()

    assert document.web_uri == expected_web_uri

Parametrize is a great feature of pytest that can really help to test many cases while minimising the amount of test code. You should always be on the look out for groups of test methods that could be reduced to a single method by using parametrize, and use if as often as possible.