Coding Walkthrough 004 - Supplanting Functions

These notes are a companion piece for the video:

Coding Walkthrough 004 - Supplanting Functions

Topic: Supplanting Functions

These are the show notes for a video demonstration of changing an existing Python program to add new functions that will supplant some existing function calls.

The purpose of the video is to show an example of doing this in the context of a real and non-trivial program. In this case it happens during an edit of "Foldatry" which is a multi-purpose file and folder tool that I am writing in Python.

Note: all the code on this page is actually "air code", written in a mere text editor as planned code which is then clipped in during the video. I've deliberately left intact the errors that quickly became apparent as development continued.

The Example Situation

From some runs of Foldatry I had the wish that I could know what quantity of data storage was at issue. While there are several places where I'd want to make that change, for this run-through I'll just look at one of its parts - Congruentry.

Note: if you want to inspect the code (as it was when these notes were written), use this specific link:

congruentry.py as at 15 Nov 2022

The Congruentry module is all about comparing two folder locations and determining whether they and all their sub-folders are the same. The main use for this is for confirming that a copy process - done by some other tool - was completed perfectly. While the simplest answer to that is either "Yes, the same" or "No, not the same" in practice when the answer is "No" we want to know something useful about that.

Congruentry is also capable of indicating when one of the two folders is a full subset of the other, in which case it can assemble a list of the additional items in the superset folder.

But, it does not bring back any information about how much file size those differences amount to. One reason that currently does not happen is because it uses three functions from the stock filecmp library to enact the comparisons - and none of those return any information about file sizes.

As it happens, for other reasons, I am also writing my own replacement for the stock filecmp library, as another module inside Foldatry - but it is not ready yet.

So our target here is that we want to replace use of the stock filecmp library with calls to a "mock" set of functions, so that:

for now they will just call the filecmp library anyway;
then later they can be changed to call the new module when it has been written (and proven).

Actually we can go one better than that plan, and add an extra change in the sequence:

initially they will just call the filecmp library anyway;
then extensions can be made, so that are the filecmp cals, the list of differences can be traversed to collect file size information;
then later again they can be changed to call the new module when it has been written (and proven).

What To Replace

Ok, so where are the calls that we will supplant?

We can easily trace them, because the top of the module congruentry.py has this import:

import filecmp as im_filecmp

Due to the alias there, the calls will all be like im_filecmp.something

And here they are (also showing the function nesting of their locations);

first, we have a call that does most of the work:

def congruency_check_trees( cct_p_Path_A, cct_p_Path_B, cct_p_Stringent, cct_p_badwinfname, cct_p_badwinchars, \

    def trees_pass( tp_p_Stringent ):

        def congruency_check_subtrees( ccs_p_Path_A, ccs_p_Path_B, ccs_p_depth, ccs_p_still_congruent, ccs_p_extra_side ):

            dcmp = im_filecmp.dircmp( ccs_p_Path_A, ccs_p_Path_B ) # ? shallow=True

wherein the function called was dircmp which returned an object dcmp that gets further interactions afterwards - we'll discuss the details of that later.

second, we have a call that is used to perform the "stringent" file comparisons of file contents:

    def congruency_check_subtrees( ccs_p_Path_A, ccs_p_Path_B, ccs_p_depth, ccs_p_still_congruent, ccs_p_extra_side ):

            (fils_match, fils_mismatch, fils_errors) = im_filecmp.cmpfiles(
                ccs_p_Path_A, ccs_p_Path_B, dcmp.common_files, shallow=False)

wherein the function called was cmpfiles which returned three lists of filenames.

third, we have a call that just compares two specific files:

        def congruency_check_subtrees( ccs_p_Path_A, ccs_p_Path_B, ccs_p_depth, ccs_p_still_congruent, ccs_p_extra_side ):

                        chckd_XAB_congruent = im_filecmp.cmp( pathfile_a, pathfile_b, shallow=False)

wherein the function called was cmp which returned a boolean.

Note: that part gets done as a followup comparison that copes with filenames that are non-identical - e.g. because of different character encodings in the file systems. For this excercise we can ignore the why of all that.

fourth, because the Congruentry module provides a direct file vs file comparison, we have another call that just compares two specific files:

def congruentry_files_command( pathfile_a, pathfile_b, p_multi_log):

    chckd_congruent = im_filecmp.cmp(pathfile_a, pathfile_b, shallow=False)

wherein the function called was again cmp which returned a boolean.

So, that was usage of:

dircmp = compare directories, returning an object - note that some comparison actions don't happen until parts of the object are called
cmpfiles = compare the files in two directories
cmp = compare two specified files

which are therefore the features we need to make supplanting functions. We may as well do these inside the Congruentry module.

The Mocking

Mock function names

We'll need three new functions, Here is my plan:

fcmp_for_two_dirs_compare_get_object to replace dircmp
fcmp_for_dirs_compare_files_get_lists = to replace cmpfiles
fcmp_for_two_files_compare_contents_get_bool = to replace cmp

Now it should be said, that it took a few rounds of thought to get those names. Quite a bit of the following was written with the names being quite different. Eventually, I settled on a prefix fcmp_ for them all and the names to be a sequence of "for this" then "do this" then "get this".

Naive mock up

Time to use the names and start framing them into being Python functions.

def fcmp_for_two_dirs_compare_get_object()
def fcmp_for_dirs_compare_files_get_lists()
def fcmp_for_two_files_compare_contents_get_bool( )

Parameters

Now let's add the the parameters. As my first target is to simply pass-through to the filecmp library, I'll set out the same parameter sets.

def fcmp_for_two_dirs_compare_get_object( p_path_a, p_path_b, p_shallow=True )
def fcmp_for_dirs_compare_files_get_lists( p_path_a, p_path_b, p_lst_common_files, p_shallow=False )
def fcmp_for_two_files_compare_contents_get_bool( p_pathfile_a, p_pathfile_b, p_shallow=False )

I am cheating however, in only bothering about the parameters I'm currently using in my own code.

Usable Mocks

Let's now add enough code to pass through to the existing calls. I don't even need to bother with local variables and instead just put the calls in the return lines.

def fcmp_for_two_dirs_compare_get_object( p_path_a, p_path_b, p_shallow=True)
    return im_filecmp.dircmp( p_path_a, p_path_b, p_shallow )

def fcmp_for_dirs_compare_files_get_lists( p_path_a, p_path_b, p_lst_common_files, p_shallow=False )
    return im_filecmp.cmpfiles( p_path_a, p_path_b, p_lst_common_files, p_shallow)

def fcmp_for_two_files_compare_contents_get_bool( p_pathfile_a, p_pathfile_b, p_shallow=False )
    return im_filecmp.cmp(pathfile_a, pathfile_b, shallow=False)

Customised Round One

But, the whole point of this exercise was to enable some changes, so let's do the first parts of how that might work.

For the simple comparison of two files (fcmp_for_two_files_compare_contents_get_bool), let's make it return whether or not a deep/content comparison was required and done.

For the comparison of just the files in the two directories (fcmp_for_dirs_compare_files_get_lists), let's return the total file size of those. For this, we'll put two extra functions inside it (pathfile_filesize and pathfile_filesizes_sum). And, one of those will, for now, only be a mock function as it will just always return None - we can work out what the valid Python for doing that is later.

def fcmp_for_two_dirs_compare_get_object( p_path_a, p_path_b, p_shallow=True):
    r_dcmp = im_filecmp.dircmp( p_path_a, p_path_b, p_shallow )
    return r_dcmp

def fcmp_for_dirs_compare_files_get_lists( p_path_a, p_path_b, p_lst_common_files, p_shallow=False ):
    def pathfile_filesize( p_path, p_file):
        return None
    def pathfile_filesizes_sum( p_path, p_lst_files):
        klang = False
        r_sum = 0
        for i_file in p_lst_files :
            i_sum = pathfile_filesize( p_path, i_file)
            if not i_sum is None :
                r_sum = r_sum + i_sum
            else:
                # a single failed filesize means the sum is invalid
                klang = klang or True
        if klang :
                r_sum = -1
        return r_sum
    (r_files_match, r_files_mismatch, r_files_errors) = im_filecmp.cmpfiles( p_path_a, p_path_b, p_lst_common_files, p_shallow)
    r_match_size_sum = pathfile_filesizes_sum( p_path_a, r_files_match)
    return r_files_match, r_files_mismatch, r_files_errors, r_match_size_sum

def fcmp_for_two_files_compare_contents_get_bool( p_pathfile_a, p_pathfile_b, p_shallow=False ):
    r_same_shallow = im_filecmp.cmp(pathfile_a, pathfile_b, True)
    if r_same_shallow and p_shallow :
        r_same_content = im_filecmp.cmp(pathfile_a, pathfile_b, False)
    else:
        r_same_content = False
    return r_same_shallow, r_same_content

Customised Round Two

In which we extend the functionality of the object method so that it can return the collective sum of the files found to be in common.

As part of this we can lean on the two functions we've already built inside the function fcmp_for_dirs_compare_files_get_lists but to have those available to both that and the "object" function, we'll bring them to the outside. For clarity we'll add our prefix fcmp_ to their names.

# support functions

def fcmp_pathfile_filesize( p_path, p_file):
    return None

def fcmp_pathfile_filesizes_sum( p_path, p_lst_files):
    klang = False
    r_sum = 0
    for i_file in p_lst_files :
        i_sum = fcmp_pathfile_filesize( p_path, i_file)
        if not i_sum is None :
            r_sum = r_sum + i_sum
        else:
            # a single failed filesize means the sum is invalid
            klang = klang or True
    if klang :
            r_sum = -1
    return r_sum

# the replacement functions

def fcmp_for_two_dirs_compare_get_object( p_path_a, p_path_b, p_shallow=True):
    r_dcmp = im_filecmp.dircmp( p_path_a, p_path_b, p_shallow )
    i_files_match = r_dcmp.same_files
    r_match_size_sum = fcmp_pathfile_filesizes_sum( p_path_a, i_files_match)
    return r_dcmp, r_match_size_sum

def fcmp_for_dirs_compare_files_get_lists( p_path_a, p_path_b, p_lst_common_files, p_shallow=False ):
    (r_files_match, r_files_mismatch, r_files_errors) = im_filecmp.cmpfiles( p_path_a, p_path_b, p_lst_common_files, p_shallow)
    r_match_size_sum = fcmp_pathfile_filesizes_sum( p_path_a, r_files_match)
    return r_files_match, r_files_mismatch, r_files_errors, r_match_size_sum

def fcmp_for_two_files_compare_contents_get_bool( p_pathfile_a, p_pathfile_b, p_shallow=False ):
    r_same_shallow = im_filecmp.cmp(pathfile_a, pathfile_b, True)
    if r_same_shallow and p_shallow :
        r_same_content = im_filecmp.cmp(pathfile_a, pathfile_b, False)
    else:
        r_same_content = False
    return r_same_shallow, r_same_content

Ready for Implanting

The above has been a "paper exercise" - just written in my programming notes tool rather than in the IDE where I actually work on my Python program. So from here, the exercise will shift into one of pasting this code into there and seeing if it will actually work.

Enacting the Size

So far, we've left the function for getting the file sizes quite unable to actually do that.

def fcmp_pathfile_filesize( p_path, p_file):
    return None

As it happens, I already have ome code for doing this, as it was needed in another module.

In the Matchsubtry module we have aline:

    size = im_os.path.getsize(fpath)

A quick check confirms that Congruentry has the same library imported:

import os as im_os

As our intended function fcmp_pathfile_filesize is currently taking two parameters: p_path and p_file we will need something to combine them.

A quick read find the place in Congruentry where I already do that:

                    new_subpath_a = im_osp.join(ccs_p_Path_A, subdir)

So this give me enough to construct the pieces to make the function operative.

def fcmp_pathfile_filesize( p_path, p_file):
    i_pathfile = im_osp.join( p_path, p_file)
    r_size = im_os.path.getsize( i_pathfile)
    return r_size

Note that I didn't go straight to putting that all in the return line. I like giving myself the option of putitng in some print statements (or some other "debug" calls) as I go to implement this for the first time.
e.g.

def fcmp_pathfile_filesize( p_path, p_file):
    print( p_path, p_file)
    i_pathfile = im_osp.join( p_path, p_file)
    print( i_pathfile)
    r_size = im_os.path.getsize( i_pathfile)
    print( r_size)
    return r_size

As we're dealing with the file system here, it is wise to not assume that we won't have things going wrong at run time.

The simplest thing is to wrap each file system interaction with a try except pair.

def fcmp_pathfile_filesize( p_path, p_file):
    try:
        i_pathfile = im_osp.join( p_path, p_file)
        path_ok = True
    except:
        path_ok = False
    try:
        r_size = im_os.path.getsize( i_pathfile)
        size_ok = True
    except:
        size_ok = False
    if not size_ok :
        r_size = None
    return r_size

That's actually slightly ambiguous, as it attempts to call getsize regardless of whether the path+file construction gave an error. I like to have the possible errors kept separate.

def fcmp_pathfile_filesize( p_path, p_file):
    # first form the path
    try:
        i_pathfile = im_osp.join( p_path, p_file)
        path_ok = True
    except:
        path_ok = False
    # then get the size
    size_ok = False
    if path_ok :
        try:
            r_size = im_os.path.getsize( i_pathfile)
            size_ok = True
        except:
            pass
    if not size_ok :
        r_size = None
    return r_size

Note that there are many ways to code that logic - for various degrees of optimisation and/or being Pythonic. For now, just getting the logic valid and bulletproof will do.