What exactly does unit testing mean? In short, unit testing is a method for verifying the functionality of individual functions or small pieces of code. Each function is tested in isolation to ensure that the implemented logic behaves as expected. For example, what happens if the input is different from what is expected? Or what happens if the input value is empty?
Ideally, unit testing should be conducted during the development phase. As soon as you have built a function, you can begin writing unit tests. But how does this process work? To understand the importance of unit testing, let's outline the basics. Unit testing involves independently testing the logic of each function you write. For every function, you create a corresponding unit test designed to validate its behavior. These unit tests consist of one or more checks that ensure the function performs as expected. For instance, you might use a variety of inputs and compare the actual output to the expected results to confirm that the function works correctly. Additionally, you can check if the input meets specific criteria, such as verifying that it is a list. When you start thinking about different scenarios while writing unit tests, you often end up not only creating the tests themselves but also adding assert statements to your functions. These assert statements help validate the logic within your code. Each test will either pass or fail. Running unit tests before integrating your new function or code into the main script helps catch issues early. A popular library for unit testing in Python is pytest, which is simple to use and can be seamlessly integrated into your CI/CD pipeline. You can, for example, configure it to automatically run all tests when certain actions occur, such as merging branches in GIT. This significantly reduces the risk of problems with your model in production.
Unit testing a function ensures that it won’t create problems within the overall system. This becomes especially crucial when the logic is incorrect, yet the function does not trigger an error message. When an error message is present, it’s clear where the issue lies, making it easier to pinpoint and fix. However, a more difficult scenario occurs when a function runs without errors but causes a failure later in your script due to faulty logic. In such cases, you might end up searching for the problem where the error appears, leading to a time-consuming hunt to identify the root cause.
For instance, consider a function that adds two parameters. While this might work smoothly for floats, integers, and even strings, the outcomes can vary significantly. Depending on the intended purpose, you may not want to allow all these data types. By thoroughly testing your functions in advance, you can catch these issues early, ultimately saving substantial time and resources.
Functions are often tested informally during development, without creating dedicated unit tests. Typically, this involves using the current input data available, which may not cover all possible scenarios. However, as your project evolves, new and unexpected values—like negative or empty inputs—could be introduced. Even if your function worked correctly with the initial data, it might fail with these new inputs. Unit testing addresses this issue by thoroughly testing your function against a wide range of input values, ensuring the reliability and robustness of your code.
Effective unit testing requires an important prerequisite: keeping your functions small and focused. Testing a function with over 100 lines of code becomes complex and difficult to manage. By ensuring that each function serves a single purpose, you not only make unit testing easier but also improve the overall structure and readability of your code. This approach results in more concise and modular functions, which come with several benefits. First, your code becomes clearer and easier to understand. Second, the risk of duplicating code unnecessarily is reduced, as smaller functions are often reusable in multiple places. Additionally, breaking down functions minimizes the likelihood of errors. When too much functionality is packed into one large function, it’s hard to ensure everything is working correctly. Moreover, this practice naturally enhances your documentation. Providing explanations for each small, well-defined function creates a more comprehensive and easily navigable script, simplifying code maintenance and handovers. In short, keeping functions small not only facilitates unit testing but also significantly boosts the quality and maintainability of your code.
In summary, unit testing enhances the readability, structure, and reliability of your script—crucial factors for building a more data-driven organization.
The example below demonstrates this concept in practice. The sample function, sum_elements, is designed with a single purpose: to sum all the elements in a list. The function begins with the first element and adds each subsequent element in turn. We expect the logic to perform a mathematical summation, meaning all provided numbers should be correctly added together. However, if the list consists only of strings, the function won’t generate an error but will concatenate the strings instead. This behavior could cause issues later in your script if you’re expecting a numeric result. A unit test can help catch this kind of problems. It ensures that the function is tested against all possible input scenarios. In many cases this will lead to adding assert statements to your function, which also makes it possible to return clear error messages for the user. It’s important to anticipate and handle such cases in advance, as unhandled errors can stop your entire script. Being prepared for these situations is crucial. To complete the example, we’ve included a unit test for the sample function sum_elements. This unit test demonstrates several different checks to validate the function’s behavior. Note that these tests are just a selection of possible scenarios and do not cover all potential edge cases or inputs.
import pytest
def sum_elements(list_of_elements: list):
"""
Goal:
This function adds all numerical values in the given list
Input:
element (list): A list with numerical elements to sum
Output:
summation (int): The summation of all numerical elements in the list list_of_elements
"""
assert isinstance(list_of_elements, list), "The input of sum_elements is not a list"
assert all(isinstance(element, (int, float)) for element in list_of_elements), "The elements of list_of_elements in sum_elements are not numerical"
summation = list_of_elements[0]
for element in list_of_elements[1:]:
summation = summation + element
return summation
class TestSumElements:
"""
All tests for sum_elements function in testfile.py
bad argument:
## input is not a list
## list elements are not numeric
special argument:
## none
normal argument:
## numerical elements must be added
"""
test = [
([1, 2, 3], 6),
([1.1, 2, 3.003], 6.103)
]
@pytest.mark.parametrize("list_of_elements, expected", test)
def test_sum_elements(self, list_of_elements, expected):
result = sum_elements(list_of_elements)
assert result == expected
def test_list_elements_not_list(self, list_of_elements=5):
with pytest.raises(AssertionError, match="The input of sum_elements is not a list"):
sum_elements(list_of_elements)
def test_list_elements_numeric_elements(self, list_of_elements=['a', 5, 'b']):
with pytest.raises(AssertionError, match="The elements of list_of_elements in sum_elements are not numerical"):
sum_elements(list_of_elements)