Title: Parametrizing Python Tests Tags: Python Unit Tests at Hypothesis Alias: /post/parametrize/ This post covers testing multiple test cases at once using `@pytest.mark.parametrize()`. In [Writing Simple Tests](/posts/writing-tests) we tested the `validate_url()` function that validates the links that users add to their profiles. `util.py` also contains a [validate_orcid()](https://github.com/hypothesis/h/blob/8d11e918005581f35f97268e9470eb3c34a6b416/h/accounts/util.py#L36) function for validating the [ORCID](https://orcid.org/) ID's that users add to their profiles (an ORCID is a unique identifier for a scientific author): ```python def validate_orcid(orcid): """ Validate an ORCID. Verify that an ORCID conforms to the structure described at http://support.orcid.org/knowledgebase/articles/116780-structure-of-the-orcid-identifier Returns the normalized ORCID if successfully parsed or raises a ValueError otherwise. """ ... ``` We want to test that validation succeeds (returning the string unmodified, without raising an exception) for a variety of valid ORCIDs. We could test this by writing a separate test method for each valid ORCID that we want to test, but that would be a lot of duplication - each test method would be exactly the same except for the ORCID used. Another way we could do this is with a `for` loop in a single test method: ```python def test_validate_orcid_accepts_valid_ids(): for orcid_id in ['0000-0002-1825-0097', '0000-0001-5109-3700', '0000-0002-1694-233X']:, assert validate_orcid(orcid_id) == orcid_id ``` This is better - there's no duplication in the test code. You could say that this breaks the [arrange, act, assert](/posts/arrange-act-assert) pattern because it calls the method under test three times, not once. But there is only one line of code that calls the method under test, it just happens to be a line inside a loop, so I think it's okay. This approach does have the disadvantage that it clutters up the body of our test method with a `for` loop. If the function we were testing had more than one parameter and we wanted to test it with different combinations of multiple parameters then the loop and the clutter would be more complicated. Pytest gives us a better way to write this kind of test - [@pytest.mark.parametrize()](http://doc.pytest.org/en/latest/parametrize.html): ```python @pytest.mark.parametrize('orcid', [ '0000-0002-1825-0097', '0000-0001-5109-3700', '0000-0002-1694-233X', ]) def test_validate_orcid_accepts_valid_ids(orcid): assert validate_orcid(orcid) == orcid ``` The `parametrize()` causes pytest to run this function three times, each time passing the next one of the ORCIDs as the `orcid` argument to the test. The first argument to `parametrize()`, called the `argnames` argument, is the string `'orcid'`:
@pytest.mark.parametrize('orcid', [
    '0000-0002-1825-0097',
    '0000-0001-5109-3700',
    '0000-0002-1694-233X',
This tells pytest the name of the test parameter that we're going to be parametrizing. Pytest finds the test method parameter with the matching name:
@pytest.mark.parametrize('orcid', [
    '0000-0002-1825-0097',
    '0000-0001-5109-3700',
    '0000-0002-1694-233X',
])
def test_validate_orcid_accepts_valid_ids(orcid):
    assert validate_orcid(orcid) == orcid
The second argument to `parametrize()` is the `argvalues` argument, this is the list of values that pytest will pass to the test function's `orcid` parameter:
@pytest.mark.parametrize('orcid', [
    '0000-0002-1825-0097',
    '0000-0001-5109-3700',
    '0000-0002-1694-233X',
])
def test_validate_orcid_accepts_valid_ids(orcid):
    assert validate_orcid(orcid) == orcid
Pytest takes the first argument from this list and calls `test_validate_orcid_accepts_valid_ids('0000-0002-1825-0097')`, then it takes the second argument and calls `test_validate_orcid_accepts_valid_ids('0000-0001-5109-3700')`, and so on. So essentially we've got three separate tests in one. This separates test data (in this case, the different ORCIDs being tested) from test logic (the code in the body of the test method), and avoids cluttering up the method body with a `for` loop. ### parametrize() with multiple parameters In the example above we parametrized just a single parameter - `orcid`. Where `parametrize()` really comes into its own is when you have a test function with multiple parameters. You can reduce what might have been many separate tests down to just one. As an example, let's look at the tests for the [update_web_uri()](https://github.com/hypothesis/h/blob/a62e378eb45a8bdef2dff17c2ed4d16d8310b64d/src/memex/models/document.py#L52) method. (Here's [the real tests for this method](https://github.com/hypothesis/h/blob/a62e378eb45a8bdef2dff17c2ed4d16d8310b64d/tests/memex/models/document_test.py#L131).) When someone annotates a document using Hypothesis we collect many URIs for that document. The URL from the browser's location bar is one URI (which we call a "self-claim" URI). Some web pages contain canonical links in the HTML, like: ```html ``` We collect these and call them "rel-canonical" URIs for the document. There can also be `rel="alternate"` and `rel="shortlink"` URIs in an HTML document. And there are many other kinds of document URIs that we collect as well. When we want to display the domain name of a document (e.g. `www.example.com`) on a Hypothesis page, or render a link to a document, we need to consider all of the different URIs for that document that we've collected into our database and select the "best" one to use. We call this best URI the document's `web_uri`, and the `update_web_uri()` method is responsible for updating it whenever we receive new URIs for a document: ```python def update_web_uri(self): """ Update the value of the self.web_uri field. Set self.web_uri to the "best" http(s) URL from self.document_uris. Set self.web_uri to None if there's no http(s) DocumentURIs. """ ``` We want to test the behaviour of `update_web_uri()` with many different combinations of URIs that a document might have - when we have just one URI for a document it should just choose that one URI, for different possible combinations of multiple types of URI it should choose the one that we think is best in each case. Here's what the tests for `update_web_uri()` could look like: ```python class TestDocumentWebURI(object): def test_given_a_single_http_or_https_uri_it_returns_it(self, factories): document = factories.Document() uri = 'http://example.com' factories.DocumentURI(uri=uri, type='self-claim', document=document) document.update_web_uri() assert document.web_uri == uri def test_if_there_are_no_http_or_https_uris_it_returns_None(self, factories): document = factories.Document() factories.DocumentURI(uri='ftp://example.com', type='self-claim', document=document) factories.DocumentURI(uri='android-app://example.com', type='rel-canonical', document=document) factories.DocumentURI(uri='urn:x-pdf:example', type='rel-alternate', document=document) factories.DocumentURI(uri='doi:http://example.com', type='rel-shortlink', document=document) document.update_web_uri() assert document.web_uri is None ... ``` ... and so on, there were several more tests for the expected results given different combinations of URIs. All of these tests have the same test logic: 1. Create a `Document` with some `DocumentURI`s 2. Call `update_web_uri()` 3. Check that `web_uri` is equal to the expected URI They differ only in the list of `DocumentURI`s that's created and in the value that we expect `web_uri` to have at the end. With `parametrize()` we can combine all of these into a single test. Here's what the beginning of a `parametrize()` call for this test might look like: ```python @pytest.mark.parametrize('document_uris,expected_web_uri', [ # Given a single http or https URL it just uses it. ([('http://example.com', 'self-claim')], 'http://example.com'), ([('https://example.com', 'self-claim')], 'https://example.com'), ... ]) def test_update_web_uri(self, document_uris, factories, expected_web_uri): document = factories.Document() for docuri_tuple in document_uris: factories.DocumentURI(uri=docuri_tuple[0], type=docuri_tuple[1], document=document) document.update_web_uri() assert document.web_uri == expected_web_uri ``` Here we're parametrizing **two** arguments to the test method, not just one as before. The `argnames` argument to `parametrize()` gives two argument names separated by a comma:
@pytest.mark.parametrize('document_uris,expected_web_uri', [
    # Given a single http or https URL it just uses it.
    ([('http://example.com',  'self-claim')],    'http://example.com'),
    ([('https://example.com', 'self-claim')],    'https://example.com'),
    ...
])
def test_update_web_uri(self, document_uris, factories, expected_web_uri):
    ...
Notice that the test function also has a third argument, `factories`. This is the same factories for creating test objects that [we saw earlier](/posts/factories). Having this argument in there doesn't do any harm - it doesn't confuse parametrize. And it doesn't even matter that the `factories` argument appears in-between `document_uris` and `expected_web_uri` - the order of the parameters doesn't matter, pytest looks at their names to figure out what they are. Since we're now parametrizing two parameters instead of one, the `argvalues` argument to `parametrize()` becomes a list of two-tuples:
@pytest.mark.parametrize('document_uris,expected_web_uri', [
    # Given a single http or https URL it just uses it.
    ([('http://example.com',  'self-claim')],    'http://example.com'),
    ([('https://example.com', 'self-claim')],    'https://example.com'),
    ...
])
def test_update_web_uri(self, document_uris, factories, expected_web_uri):
    ...
Pytest will first call the test function passing the values from the first two-tuple as arguments:
@pytest.mark.parametrize('document_uris,expected_web_uri', [
    # Given a single http or https URL it just uses it.
    ([('http://example.com',  'self-claim')],    'http://example.com'),
    ([('https://example.com', 'self-claim')],    'https://example.com'),
it'll call:
test_update_web_uri([('http://example.com',  'self-claim')], factories, 'http://example.com'):
It'll then take the second two-tuple:
@pytest.mark.parametrize('document_uris,expected_web_uri', [
    # Given a single http or https URL it just uses it.
    ([('http://example.com',  'self-claim')],    'http://example.com'),
    ([('https://example.com', 'self-claim')],    'https://example.com'),
and run the test with those arguments:
test_update_web_uri([('https://example.com',  'self-claim')], factories, 'https://example.com'):
and so on. Here's the full, final version of the test with all parametrized cases: ```python @pytest.mark.parametrize('document_uris,expected_web_uri', [ # Given a single http or https URL it just uses it. ([('http://example.com', 'self-claim')], 'http://example.com'), ([('https://example.com', 'self-claim')], 'https://example.com'), ([('http://example.com', 'rel-canonical')], 'http://example.com'), ([('https://example.com', 'rel-canonical')], 'https://example.com'), ([('http://example.com', 'rel-shortlink')], 'http://example.com'), ([('https://example.com', 'rel-shortlink')], 'https://example.com'), # Given no http or https URLs it sets web_uri to None. ([], None), ([ ('ftp://example.com', 'self-claim'), ('android-app://example.com', 'rel-canonical'), ('urn:x-pdf:example', 'rel-alternate'), ('doi:http://example.com', 'rel-shortlink'), ], None), # It prefers self-claim URLs over all other URLs. ([ ('https://example.com/shortlink', 'rel-shortlink'), ('https://example.com/canonical', 'rel-canonical'), ('https://example.com/self-claim', 'self-claim'), ], 'https://example.com/self-claim'), # It prefers canonical URLs over all other non-self-claim URLs. ([ ('https://example.com/shortlink', 'rel-shortlink'), ('https://example.com/canonical', 'rel-canonical'), ], 'https://example.com/canonical'), # If there's no self-claim or canonical URL it will return an https # URL of a different type. ([ ('ftp://example.com', 'self-claim'), ('urn:x-pdf:example', 'rel-alternate'), # This is the one that should be returned. ('https://example.com/alternate', 'rel-alternate'), ('android-app://example.com', 'rel-canonical'), ('doi:http://example.com', 'rel-shortlink'), ], 'https://example.com/alternate'), # If there's no self-claim or canonical URL it will return an http # URL of a different type. ([ ('ftp://example.com', 'self-claim'), ('urn:x-pdf:example', 'rel-alternate'), # This is the one that should be returned. ('http://example.com/alternate', 'rel-alternate'), ('android-app://example.com', 'rel-canonical'), ('doi:http://example.com', 'rel-shortlink'), ], 'http://example.com/alternate'), ]) def test_update_web_uri(self, document_uris, factories, expected_web_uri): document = factories.Document() for docuri_tuple in document_uris: factories.DocumentURI(uri=docuri_tuple[0], type=docuri_tuple[1], document=document) document.update_web_uri() assert document.web_uri == expected_web_uri ``` Parametrize is a great feature of pytest that can really help to test many cases while minimising the amount of test code. You should always be on the look out for groups of test methods that could be reduced to a single method by using parametrize, and use if as often as possible.