In the previous post, I added UPC-A checkdigit verification to my ISBN verification code. This was a departure from the “theme” of the previous work which was focused on ISBN-10 and ISBN-13 codes alone. As I did this work, I made a few observations:
- Most of the check digit calculations were VERY similar, but not quite identical
- The group of methods/functions no longer were solely ISBN specific
- Duplicate code was starting to pop up in the class
Based on this, I decided to do a quick refactoring exercise and see if I could simplify things a bit.
The first thing I did is changed the name of my class from
ChecksumCalculator. Truth in advertising!
The next observation I had is that each of the check digit calculation mechanisms were a variation of a weighted sum — where each digit was multiplied by a weight and then added to the running checksum. The final digit was calculated by figuring out how much more would need to be added to the total to make it an even multiple of some modulus (usually 10). UPC-A, ISBN-13, and ISBN-10 codes all seemed to use this mechanism. Based on this, I decided to write a more generic, reusable function to calculate a weighted sum checkdigit. I had to give this function a bit more information (namely, I had to figure out a way to tell it which weights would be needed, and also the modulus to use in the calculation). But, once I did this, I discovered that it dramatically simplified the other methods in my class and also dramatically reduced the potential for future errors. Because this function would be reused my many other functions and contained a couple of non-obvious parameters, I decided to write a Python docstring for it so that future consumers could understand what was going on.
This is what I ended up with for my method:
@staticmethod def calc_weighted_sum_check_digit(code_string: str, weight_array: [int], modulus: int = 10) -> int: """ Returns the proper check digit for a codestring, calculated using a weighted sum mechanism Parameters: code_string: a string of integer digits for which a check digit is desired weight_array: an array of integers containing the "weight" by which each digit should be multiplied. This array is used in a circular manner, so for the array [1, 3], the weights applied would be 1, 3, 1, 3, 1, 3... until there are no more numbers in the code_string. modulus: the modulus to use in the calculation. 0 <= value_returned < modulus Returns: An integer between 0 and modulus, representing the "checkdigit." Note: If the modulus is > 10, this checkdigit could actually consist of two or more integer digits -- with modulus = 11, 10 is a valid "checkdigit" """ num_weights = len(weight_array) checksum = 0 for (count, digit) in enumerate(code_string): weight = weight_array[count % num_weights] checksum += (int(digit) * weight) check_digit = (modulus - (checksum % modulus)) % modulus return check_digit
And, now that I have this method, it allows me to reduce my check digit calculation functions to the following:
def calculate_isbn_13_checkdigit(isbn13_first12_numbers: str) -> str: if len(isbn13_first12_numbers) != 12 or not isbn13_first12_numbers.isnumeric(): raise ChecksumCalculator.FormatException("Improper format in first 12 numbers of ISBN13") check_digit = ChecksumCalculator.calc_weighted_sum_check_digit(code_string=isbn13_first12_numbers, weight_array=[1, 3], modulus=10) return str(check_digit) def calculate_upc_checkdigit(first_11_numbers: str) -> str: if len(first_11_numbers) != 11 or not first_11_numbers.isnumeric(): raise ChecksumCalculator.FormatException("Improper format in first 11 numbers of UPC") check_digit = ChecksumCalculator.calc_weighted_sum_check_digit(code_string=first_11_numbers, weight_array=[3, 1], modulus=10) return str(check_digit)
It’s during a refactoring exercise like this that I really appreciate unit tests! As I made my changes, I was able to make sure that I had not broken any functionality by simply running the unit tests I developed. At one point, they caught a mistake that I made and allowed me to fix it before I went too far down the wrong path. Whew!
So, there it is. I’ve now got an appreciation for the various checksums that exist in the barcodes that I commonly see on books, or boxes of cereal. And, in getting that understanding, I’ve also had a chance to sharpen my Python programming skills a bit. I think that I’m going to try to dig into something that involves file processing next.