Fixing a fifteen-year-old curve fit bug
Paul Cochrane šŖšŗ
Posted on August 22, 2024
Originally published on peateasea.de.
The backwards compatibility of Perl software is wonderful. Thatās why itās all the more jarring when you find a package that doesnāt work. This is the story of a 15-year-old bug that I managed to track down and, fortunately, resolve.
Yāknow, I hate it when software doesnāt ājust workā; both as a user and as a developer. Usually, something like this distracts me and I want to work out what the problem is before I can continue with whatever I was doing beforehand. Sometimes, diving down the rabbit hole doesnāt lead anywhere and I have to find a workaround and move on. Other times I get lucky and I work out what the bug was and can fix it. This is one of those lucky times.
Thereās something magical about diving into solving a bug. Iām able to find real focus, time flies and I come out the other end tired yet invigorated. I really enjoy these kinds of bug-hunting puzzles and I seem to be fairly good at it.1 But maybe Iām just overly stubborn (in a good way!).
I like to understand why something doesnāt work and then see if I can find a solution. Also, if I find a solution, I like to understand why a solution works. After all, sometimes while poking around, I stumble on something that suddenly works. Yet I find in such cases that itās always good to understand why it works. That way I can be more sure that the solution really does solve the underlying problem.
As I mentioned above, one of the great things about Perl is that often code from a decade or so ago ājust worksā. The Perl people put a lot of effort into ensuring backwards compatibility. Thatās why itās so surprising when something doesnāt work, as with the situation I recently found myself in.
The backstory
So how did I get into this situation? Well, I wanted to run some old code of mine to process Arctic sea-ice extent data and see what the results look like now. To do that, I needed to install the projectās dependencies. Unfortunately, running
$ cpanm --installdeps .
failed. Specifically, the distribution2 Algorithm::CurveFit
(which I use to fit a curve to the sea-ice extent data) failed its tests and hence wouldnāt install. Weird. That used to work without a problem. Hrm.
I checked the distās info on meta::cpan
, but it hadnāt been updated since 2010. In other words, there hadnāt been any recent changes to cause the failure. Since Iād written my program in 2014 and hadnāt changed it since 2018, Iād expected everything to still run as normal. Also weird.
I browsed through the code on meta::cpan
and found that it was plain Perl: it didnāt use any XS or anything which looked special. In other words, this should still ājust workā.3 More weird.
Then I found out that this has been a known problem since 2017 and that there was a ticket for the issue on RT. The gist of the ticket was that the dist had stopped working on Debian stretch, whereas it used to work on Debian jessie.
Hrm, interesting. I wonder why?
I was starting to get pulled into the bugās curiosity vortex. I thought, āIāll have a quick look at this, maybe I can spot somethingā. Famous last words. Down the rabbit hole I plungedā¦
Rebuilding the past
So where to start? My first instinct was to try and recreate the past. After all, the dist used to pass its test suite on Debian jessie with Perl 5.18 (the last Perl version Iād used). Thus, if I could see it working in the old environment, maybe I could spot where things had changed. This would give a working reference system and a baseline so I could see what ācorrectā looked like.
To recreate this past state, I spun up a Vagrant virtual machine with Debian jessie. Hereās the configuration I used in my Vagrantfile
:
config.vm.define "bare-test-jessie" do |bare_test_jessie|
bare_test_jessie.vm.box = "debian/jessie64"
bare_test_jessie.vm.network "private_network", ip: "192.168.23.107"
bare_test_jessie.vm.network "forwarded_port", guest: 22, host: 2207, id: "ssh"
end
Why did I use 107 at the end of the private network address and 2207 for the forwarded port? Well, this is simply the seventh entry in my Vagrantfile
. Thereās nothing special about the number other than to keep my Vagrant VMs separate. Note that itās not necessary to specify the forwarded port; I do so to avoid any port conflicts when running multiple VMs.
Spinning up the new VM was a simple matter of:
$ vagrant up bare-test-jessie
Upon startup, I got this error:
The following SSH command responded with a non-zero exit status.
Vagrant assumes that this means the command failed!
apt-get -yqq update
apt-get -yqq install rsync
Stdout from the command:
WARNING: The following packages cannot be authenticated!
rsync
Stderr from the command:
W: Failed to fetch http://security.debian.org/dists/jessie/updates/main/source/Sources 404 Not Found [IP: 151.101.194.132 80]
W: Failed to fetch http://security.debian.org/dists/jessie/updates/main/binary-amd64/Packages 404 Not Found [IP: 151.101.194.132 80]
W: Failed to fetch http://httpredir.debian.org/debian/dists/jessie/main/source/Sources 404 Not Found [IP: 199.232.190.132 80]
W: Failed to fetch http://httpredir.debian.org/debian/dists/jessie/main/binary-amd64/Packages 404 Not Found [IP: 199.232.190.132 80]
E: Some index files failed to download. They have been ignored, or old ones used instead.
E: There are problems and -y was used without --force-yes
This is often the first stumbling block when working with an old system: the APT repository is no longer available from the original address as listed in the VM. I.e. in this case, thereās no entry for jessie at the URLs http://httpredir.debian.org/debian
or http://security.debian.org/
. Thereās also the related problem that APT repositories no longer use plain HTTPāthey have to use HTTPS these daysābut thatās not the underlying issue here.
Fortunately, the APT repository is still available as part of the Debian distribution archives. This means I only needed to update the URLs in /etc/apt/sources.list
to point to the archive.4
Although the error looked bad (with a wall of red text) the VM was up; rsync
had failed to install, nothing more. Other than that, everything was running. Thus, I could ssh
into the VM and set things up further:
$ vagrant ssh bare-test-jessie
Opening up vi
5 on /etc/apt/sources.list
$ sudo vi /etc/apt/sources.list
I changed the content to look like this:
deb http://archive.debian.org/debian/ jessie main non-free contrib
deb-src http://archive.debian.org/debian/ jessie main non-free contrib
deb http://archive.debian.org/debian-security/ jessie/updates main non-free contrib
deb-src http://archive.debian.org/debian-security/ jessie/updates main non-free contrib
Now it was possible to update the package list and install some software:
$ sudo apt update
$ sudo apt -y upgrade
$ sudo apt -y --force-yes install vim curl build-essential
Note that the --force-yes
option was necessary because some of the packages canāt be authenticated due to their age. This is another one of those things one has to deal with when working with legacy systems.
I installed the vim
package because, although vim
was installed, it only ran in vi
compatibility mode. I needed the vim
command because thatās what my muscle memory expects. Old habits die hard.
I also needed curl
so that I could do the curl <url> | bash
dance to install perlbrew. The build-essential
package was, erm, essential so that I could build Perl from source.
Installing perlbrew and setting it up looked like this:
$ curl -L https://install.perlbrew.pl | bash
$ source ~/perl5/perlbrew/etc/bashrc
$ echo 'source ~/perl5/perlbrew/etc/bashrc' >> ~/.bashrc
Installing Perl looked like this:
$ perlbrew --notest install perl-5.18.4
$ perlbrew switch 5.18.4
$ perlbrew install-cpanm
I chose to install Perl 5.18.4 because that was the most recent version Iād used in my program way back in 2018. The reasoning being that if Algorithm::CurveFit
had worked with that version back then, it should work with that version in a Debian jessie VM now.
Unfortunately, the cpan/Time-HiRes/t/nanosleep.t
test fails (the only one to fail in the entire Perl test suite); hence itās necessary to use the --notest
option for the installation to complete successfully.
After switching to the newly-installed Perl version, and installing cpanm
in that environment, I was ready to try installing Algorithm::CurveFit
to see if it works in my rebuild of the past:
$ cpanm Algorithm::CurveFit
Which worked! I.e. it built, tested and installed as expected. This was my baseline test: now I had a system configuration in which Algorithm::CurveFit
worked (meaning, its test suite passed).
Now I needed to be able to dig around in the code and change things. To do that I needed to get the source tarball and unpack it:
$ wget https://cpan.metacpan.org/authors/id/S/SM/SMUELLER/Algorithm-CurveFit-1.05.tar.gz
$ tar -xvzf Algorithm-CurveFit-1.05.tar.gz
Changing into the directory created by tar
, building the Makefile
, and running the test suite from the source tarball, I was able to make sure that this code also worked as I expected:
$ cd Algorithm-CurveFit
$ perl Makefile.PL
Checking if your kit is complete...
Looks good
Generating a Unix-style Makefile
Writing Makefile for Algorithm::CurveFit
Writing MYMETA.yml and MYMETA.json
$ make test
cp lib/Algorithm/CurveFit.pm blib/lib/Algorithm/CurveFit.pm
PERL_DL_NONLAZY=1 "/home/vagrant/perl5/perlbrew/perls/perl-5.18.4/bin/perl" "-MExtUtils::Command::MM" "-MTest::Harness" "-e" "undef *Test::Harness::Switches; test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
t/00pod.t ........ skipped: Test::Pod 1.00 required for testing POD
t/00podcover.t ... skipped: Test::Pod::Coverage 1.00 required for testing POD coverage
t/01basic.t ...... ok
t/02bad_deriv.t .. ok
All tests successful.
Files=4, Tests=20, 2 wallclock secs ( 0.04 usr 0.03 sys + 1.30 cusr 0.09 csys = 1.46 CPU)
Result: PASS
It wasnāt necessary to install the dependencies in this step, because the cpanm Algorithm::CurveFit
call from earlier had already installed them.
To make absolutely sure that the test which fails āin the futureā still worked on Debian jessie (and to ensure that I was using the local library, and not the one Iād installed globally via cpanm
earlier),6 I ran the test via perl
directly:
$ perl -Ilib t/02bad_deriv.t
1..13
ok 1
ok 2
ok 3
ok 4
ok 5
ok 6
ok 7
ok 8
ok 9
ok 10
ok 11
ok 12
ok 13
This was all the detail I could get at this point. I.e. the test suite could only tell me that the tests had passed.
Even so, now with a solid, working baseline reference system I was able to move forward and try to work out why the code didnāt work in the present day.
A quick peek before digging in
So, with a working reference system, what to do now? Simply staring at the code and mulling over what it does wasnāt a good strategy at this stage because Iād not yet interacted with the code and seen how information flows through the system. Still, having a quick skim read of the code was a good idea, just to get a feel for the codeās structure and generally to get a birdās eye view before looking for places to dig deeper.
Looking through the code, the dist consists of a single Perl module with about 250 lines of code excluding docs and comments. The module calculates a best fit of a given function to an input set of data based upon an initial guess of the functionās coefficients. The code seemed quite intricate, yet it was well commented and 250 lines of code seemed manageable to understand.
Poking the system to gain a feel for the software
Ok, so what does the test failure look like on ālater-than-jessieā systems?
$ perl -Ilib t/02bad_deriv.t
1..13
ok 1
ok 2
not ok 3
# Failed test at t/02bad_deriv.t line 50.
ok 4
ok 5
ok 6
ok 7
ok 8
ok 9
ok 10
ok 11
ok 12
ok 13
# Looks like you failed 1 test of 13.
Unfortunately, thatās not very informative. Looking in the RT ticket, SREZIC mentions that itās possible to get a bit more information about why the tests in 02bad_deriv.t
fail by using cmp_ok
. Taking his advice, I appended these tests:
cmp_ok($v + $eps[$par], ">", $val[$par]);
cmp_ok($v - $eps[$par], "<", $val[$par]);
and ran the test file again (perl -Ilib t/02_bad_deriv.t
), getting this output:
not ok 5
# Failed test at t/02bad_deriv.t line 52.
# '5'
# >
# '5.15'
As my Masterās supervisor used to ask me quite a lot: āYa good, but what does it mean?ā. Good question. Dunno. Yet.
Looking at the test code more closely, we have this snippet:
my @val = (5.15, 0.01, 0.100, 1.045);
my @eps = (1.00, 0.10, 1.000, 0.100);
foreach my $par (0..$#parameters) {
my $v = $parameters[$par][1];
ok(defined $v);
ok($v + $eps[$par] > $val[$par]);
ok($v - $eps[$par] < $val[$par]);
cmp_ok($v + $eps[$par], ">", $val[$par]);
cmp_ok($v - $eps[$par], "<", $val[$par]);
}
which includes the newly added cmp_ok
calls.
This loops over a set of parameters and compares their calculated values (in the @parameters
array defined earlier in the test) with their expected values (in the @val
array) to within a given tolerance for each (specified in the @eps
array). What the test failure output is telling us is that we expect the value for the first element in the @val
array to be greater than 5.15, but its actual value plus its tolerance (the first element of the @eps
array) is 5. Since the first value of the @eps
array is 1.00, the calculated value must be 4. Checking the initial guess of what the parameters should be (from earlier in the test)
my @parameters = (
['s', 4.0, 0.01],
['c', 0.01, 0.001],
['y0', 0.1, 0.001],
['x0', 1.0535, 0.01],
);
we see that the s
parameter is initially set to 4.0, which suggests that this is the parameter not achieving its expected value. Was I 100% sure? Well, no. I like to be very sure that Iāve understood everything properly before moving onwards, so itās best to have a look at the parameters. Fortunately, the author had left some debugging code in the test:
#use Data::Dumper;
#print Dumper \@parameters;
Reactivating this (i.e. uncommenting it) and running the tests again, I got:
$ perl -Ilib t/02bad_deriv.t
1..13
$VAR1 = [
[
's',
4,
'0.01'
],
[
'c',
'0.0471889540880574',
'0.001'
],
[
'y0',
'0.1',
'0.001'
],
[
'x0',
'1.0535',
'0.01'
]
];
which shows the parameter values after having calculated the fit.
So there ya go: the first parameter is called s
, its final value is 4 (as I calculated above) and its accuracy is 0.01.7
Staring at the test code I realised that the @parameters
array must be updated in-place. It didnāt make any sense to set the @parameters
to an initial guess and then expect (as part of the test assertions) that its values had changed after fitting the curve. That was a good thing to know and a piece of the puzzle to set aside for later. I might not need it, but ya never know.
So what was this output telling me? The main point was that s
wasnāt changing at all: it had the value of 4 both before and after running the curve fitting algorithm.8 Thatās weird because in the Debian jessie code, we know that the test passes, which means that its final value should be close to 5.15. Why doesnāt this value change anymore?
A numerical stability problem?
Iād like to say that I used a very focused and analytical approach to how I continued, but I canāt: I turned to plain old print
debugging.
Somehow I feel ashamed to admit that Iād simply used print
. The thing is, itās easy to use, itās flexible, and it allows me to see how the values Iām interested in evolve at a single glance. What can I say? Data::Dumper
is your friend!
Staring at the code again, I realised that it iteratively updates the parameters of the function to fit within a main while
loop. This seemed to be the best place to look for anything funny going on. Thus, I started sprinkling print Dumper $<variable_name>;
statements in various places, while comparing the output from the jessie system and from my bullseye system (where the tests were failing).
I got lucky: one thing stuck out. The @cols
variable used as input to Math::MatrixReal->new_from_cols()
, which creates a
matrix of datapoints X parameters
had a very large number turn up in its output. Like, HUGE. Where other numbers in the array were sitting between roughly -2.0 and 2.0, one number was 9.22229939811846e+20.9 Having done a fair bit of numerical computational work as part of my PhD and later while working on simulation systems and scientific computing, I smelled a numerical precision issue floating10 around somewhere. I noticed that part of the code used derivatives, hence I guessed two numbers on a denominator somewhere were getting too close to one another, causing the value in the @cols
array to explode.
Ok, so if weāre calculating derivatives, what are we calculating derivatives of? The test tries to fit the following formula to the input data:
formula => 'sqrt( s*(x-x0)^2 + c) + y0'
Formatting this a bit more nicely, we have:
What does this function look like? Seeing a functionās form when plotted can give a feel for how it behaves. I defined the function in the Sage mathematics software system and plotted it
# initial guess parameters
s = 4
y0 = 0.1
x0 = 1.0535
c = 0.01
# the fit function
f = sqrt(s*(x-x0)^2 + c) + y0
# plot the function and save it to file
p = plot(f, (x,-2, 2), axes_labels=['$x$', '$y$'])
p.save('algorithm-curvefit-bug-fit-function.png')
which produced this image:
Hrm, this function is āinterestingā just past . As the test filename suggests, this could be a function with a ābad derivativeā.
Of course, now the question is: whatās the derivative? Again, using Sage, we can work this out:
# calculate the derivative of the fit function
f.derivative(x)
=> s*(x - x0)/sqrt(s*(x - x0)^2 + c)
which, written more nicely, looks like this
Plotting this function over the same range, we have
Yup, this definitely has a funky derivative, especially just above . Now I was more sure that this is why the test file has the name it does. This indicates that around this value, we might be getting numerical instabilities. I decided to look closer.
Letting in the derivative , we get
In other words, this is where the derivative crosses the axis. The only numerically interesting thing is happening in the denominator, which is
but setting here only gives
which is nonzero (and not particularly small), so I didnāt think this would be causing any nasty numerical problems. Sure, the derivative changes a lot around , but it doesnāt seem like anything which could be causing a numerical instability.
Not being satisfied with the numerical instability explanation, I decided to look at the code more closely. There is a chunk of code close to the beginning of the main while
loop which finds the value of the derivative for each
point of the function to fit and eventually adds these values to the @cols
array. This chunk of code is split into two blocks: the first calculates the derivative if the $deriv
variable is a CODE
reference, and the second otherwise. Both blocks have a section where the code falls back to a numerical method if the derivative isnāt a defined value.
When running the test t/02bad_deriv.t
, only the second block is executed. I found that thereās one case where the fallback is triggered: when the
value from the input data is equal to
(i.e. where the derivative changes the most). To calculate a value when the derivative is undefined, the code uses a closure called $formula_sub
which (as far as I could tell) returns the value of the function to fit at this
value,11 thus providing a substitute for the undefined derivative value.
When the $function_sub
closure calculates the function to fit at x0
, we can simplify the function
to
which is non-zero and seems well-behaved. Again, this doesnāt look like a numerical issue.
I then investigated the values that $function_sub
returns as well as its input parameter values. I found that the enormous value appears exactly when x
is equal to x0
. Given the input parameter values and the known value for x
, I calculated the value $function_sub
should return. Using the above expression for
and the values of c
and y0
, I got:
This shouldnāt be hard for a computer to calculate. Hence I think I can stomp on the numerical instability hypothesis and will have to find a different cause for the anomalous large numbers.
Structure in data and code points the way
Wild goose chases arenāt exactly uncommon when debugging code. But then, serendipity also plays a role. So it was in this case for me.
While āprint
debuggingā the parameter values passed into $function_sub
, I noticed a difference in structure between separate calls. Sometimes the structure looked like this:
$VAR1 = 'x';
$VAR2 = '1.0535';
$VAR3 = 's';
$VAR4 = [
's',
4,
'0.01'
];
$VAR5 = 'c';
$VAR6 = [
'c',
'0.0467904042605691',
'0.001'
];
$VAR7 = 'y0';
$VAR8 = [
'y0',
'0.1',
'0.001'
];
$VAR9 = 'x0';
$VAR10 = [
'x0',
'1.0535',
'0.01'
];
and sometimes like this:
$VAR1 = 'x';
$VAR2 = '0.7199';
$VAR3 = 's';
$VAR4 = 4;
$VAR5 = 'c';
$VAR6 = '0.0467904042605691';
$VAR7 = 'y0';
$VAR8 = '0.1';
$VAR9 = 'x0';
$VAR10 = '1.0535';
Looking back, this should have made some warning bells go off. Perhaps I was tired. All I thought at the time was, āhrm, thatās oddā. However, the thought was not lost. It got stored somewhere in the back of my mind, ready to make pennies drop later.
Digging through the rest of the code, I happened to notice that $formula_sub
was called in other locations, this time not involved in derivative calculations. In this other situation, it assisted in the calculation of the residuals.
Hereās the code for comparison, once embedded in the derivative calculation:
$value = $formula_sub->($xv, @parameters)
and again in the residuals calculation:
my @beta =
map {
$ydata[$_] - $formula_sub->(
$xdata[$_],
map { $_->[1] } @parameters
)
} 0 .. $#xdata;
The second instance isnāt that clear, so removing some of the surrounding code and reformatting it to be on one line, we get
$formula_sub->($xdata[$_], map { $_->[1] } @parameters)
This passes the current value of x
and the functionās parameter values as a list of arguments into $formula_sub
. The parameter values are stored in the second element of the sub-array within each element of the @parameters
array, hence the $_->[1}
bit.
I noticed that this had the same rough form as the derivative calculation code
$value = $formula_sub->($xv, @parameters)
Hang on! What was that map
doing in there?
And then something clicked. The difference in how $formula_sub
was called was the reason for the difference in parameter structure within $formula_sub
that Iād noticed earlier.
I then looked again at the parameter values appearing in $formula_sub
. Instead of a value being associated with each parameter, there was an array containing the parameterās metadata being associated with each parameter. Not only is this doubled up, itās fairly obviously wrong.
With this new perspective, I realised that this was the underlying problem: the parameters werenāt being passed to $function_sub
correctly when calculating the fallback derivative. What was happening was that the @parameters
array was being concatenated with the scalar value of the current x
value. Thatās not correct: $formula_sub
expects only a list of argument values, not an array-of-arrays concatenated onto a scalar.
What I guessed was probably happening (and now Iām getting into uncharted territory) is that a memory reference value was being passed to $formula_sub
for one of the parameter values instead of the actual parameter value. A memory reference value could be any weird value and it wouldnāt surprise me if this was the source of the huge number I was seeing.
Striving for a deeper explanation
Wait a minute. If this is the cause of the approximately 1e+20 number, and the code is the same now as it was on the jessie system, that means this issue should also exist on the jessie system. Hrm.
Looking at the $function_sub
value at x == x0
on my rebuilt jessie system, there it was: a huge number. This time it was 216930334118.067,9 which is of the order of 1e+11. This means the issue has been lurking since (at least) 2010.
So whatās going on here? Why does the code work on the older system but not on the newer one?
My attempt at an explanation is this: on the jessie system, 1e+11 wasnāt large enough to stop the fitting algorithm from converging to a sensible solution. However, when this number is as large as 1e+20, it dominates so much (as one would expect: itās 9 orders of magnitude larger than that on the jessie system) that the fitted solution never deviates from the initial guess. I also found that increasing the maximum number of iterations doesnāt help convergence to the correct value, because the solution āconvergesā after only 3-4 iterations.
Of course, now the question becomes: why is the outlier function value on jessie approximately 1e+11 but on later systems it has jumped to approximately 1e+20?
My best guess is that it might be due to a change of integer precision in something, somewhere between jessie and stretch. For instance, consider that the maximum integer values for 32-bit and 64-bit are:
MAX_INT 32 bit: 2,147,483,647 => 2e+09
MAX_INT 64 bit: 9,223,372,036,854,775,807 => 9e+18
Note that both of these numbers are two orders of magnitude smaller than the large numbers I was seeing on jessie and later systems, respectively. Could we just be seeing MAX_INT*100
being converted into a floating point value? I honestly donāt know, and this is where the trail becomes hard to follow.
There are now more questions than I think I can definitively answer:
- does this point to a libc change? The main C library
libc
changed from 2.19 to 2.24 between jessie and stretch. - is this due to the GCC version change? This went from 4.9.x in jessie to 6.x in stretch.
- in the change from GCC 4 to GCC 5 (which was on the way to GCC 6 in stretch), the default mode for C was changed from
-std=gnu89
to-std=gnu11
, which is a pretty large jump! Maybe this is a cause?
In each case, I couldnāt find any evidence for something like an integer precision change. Of course, that doesnāt mean it doesnāt exist; it merely means I didnāt find anything. At the end of the day, I couldnāt find a reason and can only speculate that where once a 32-bit integer was used, now a 64-bit integer is the default, hence causing the large change between the outlier values I observed.
What is interesting, I find, is that the outlier numbers are very similar to the respective MAX_INT
value on each system. By this, I mean that on my jessie test system, the huge outlier number (as a string of numbers) looked like approximately MAX_INT32*100
each time I ran it. The same occurred on stretch and later systems: the outlier consistently looked like MAX_INT64*100
. Itās a hint that I might be on the right track with my guess about an integer precision change, but itās only that: a hint.
Avoiding further speculation on the exact cause of this change, I decided to put it into the ātoo hard basketā, accept it, and move on.
Bug fix
Whatās great about the analysis I made earlier is that Iāve also stumbled upon the correct way to pass parameters into $function_sub
and thus the fix is rather simple. Changing the calls to $function_sub
where the derivatives are calculated from
$value = $formula_sub->($xv, @parameters)
to
$value = $formula_sub->($xv, map { $_->[1] } @parameters)
resolves the problem, and the test suite now passes. Yay! š
Also, $formula_sub
now returns the value 0.2 when x == x0
as we calculated at the end of the āA numerical stability problem?ā section. Also nice.
Thatās it. Thatās the bug fix. Quite an anti-climax in the endā¦
First contact
Since the distās author hadnāt been active on this project since roughly 2010, Iād guessed that heād moved on and the email address probably wasnāt active anymore. I mean, this is understandable: itās open-source software and maintainers have lives and end up doing other things. On the off chance that emails do still get through, I sent him a message ā¦ and he replied! I was totally stoked!
After having a thorough look around, unfortunately he was unable to find any old source code repositories on his computers at home. The original source code repository used to be http://svn.ali.as/cpan/trunk/Algorithm-CurveFit/, which no longer exists. Thus we thought that the only code available now would be the tarball on CPAN. Consequently, he created a GitHub repo for Algorithm::CurveFit
making commits corresponding to each version number. This was the best solution to create a repo for what looked like lost information.
Later, he found out that the original repository was buried within the historical archive of Adam Kennedyās Open Repository, which is hosted on GitHub. Thus, weāve been able to find the original code with commits. This is brilliant as then I can do a bit of repository archaeology and try to work out what the author was thinking when he implemented various things.
As yet, heās not had time to extract the commits from the Algorithm-CurveFit
subdirectory of the TheOpenRepository
repository into the new p5-Algorithm-CurveFit
repo. Doing so would be nice as it would recreate the state of the original repository. Interestingly enough, I blogged about this process a few years ago. The short answer about how to do this is to use git subtree split
.
Also, heās very kindly given me COMAINT on the project. So, as soon as the bug fix is reviewed, Iāll roll out a bug-fix release.
Then I can get my old project back up and running again. š
Open question: the five-point stencil
Although Iām pretty sure Iāve stomped on the bug that sent me on this journey, I now have more questions than when I started. Iād hoped the original repoās commit messages would answer some, but it looks like the information has been lost to the winds of time.
One thing in particular that interests me is the five-point stencil calculation mentioned in the code. Was this ever implemented? From the code, it looks like it was started but never finished. The module was released as if it had been implemented, because this concept is mentioned in the CHANGELOG. Asking the author about this he too thinks that, from looking at the code, it wasnāt implemented. Of course, details are scarce because itās a been long time since he wrote the module.
The author mentioned a five-point stencil fix for Algorithm::CurveFit
in a bug report in the Math::Symbolic
dist, so it seems that the implementation was intended. Given that the $h
and $t
variables are assigned but never used around where one would expect a five-point stencil calculation, it could be that this was implemented locally but somehow never made it into the release that mentions it.
It seems that the function value returned from $function_sub
when calculating the derivative should actually be turned into a derivative via the five-point stencil method. I guess my worry is that Iāve solved one bug only to have uncovered another, potentially deeper issue. In other words, in certain corner cases, the module calculates incorrect values. That would be bad.
Itās also entirely possible that this was merely a simple oversight and now Iāve got the opportunity to fix it. That would be nice š
W(h)ither software?
While working on this I had all kinds of philosophical thoughts. For instance: how to maintain such projects? Should we just let them wither on the vine and die? Should we fork them? At the end of the day, who takes care of our ageing software?
I donāt have answers to these questions, but I think that software that people are using should at least be able to be maintained and kept up to date. Fortunately, open-source software allows anyone to help out, meaning that itās at least possible to keep small projects like the one discussed here alive and kicking.
Wrapping up
So thatās it! Another bug found, another bug squashed. Thanks for coming along for the ride!
Iām available for freelance Python/Perl backend development and maintenance work. Contact me at paul@peateasea.de and letās discuss how I can help solve your businessā hairiest problems. ā©
Often, the words āmoduleā, ādistributionā and ādistā (short for ādistributionā) are used interchangeably. Yet, a ādistributionā is the term for one or modules bundled together. Often, a distribution contains a single module, hence why the terms get used interchangeably. ā©
Iām beginning to sound like a cracked record, now arenāt I? ā©
I originally spotted this tip in the Debian forums. ā©
Although the
vi
command exists and actually runsvim
, this is onlyvim-tiny
, which runsvim
invi
compatibility mode. Thus thevim
command doesnāt exist and to be able to use it one has to install thevim
package after first fixing thesources.list
and doing theupdate
/upgrade
dance. ā©I did mention that Iām stubborn, didnāt I? š ā©
The meaning of these elements I gleaned from the moduleās POD. ā©
I realised much later that almost none of the parameters had changed from their initial guesses: only
c
had been updated slightly. ā©The exact number changes with each run of the test suite. ā© ā©2
Pun not intended. ā©
Note that this is rather than the value of the derivative. This is odd because in the context where this is called, the code expects a derivative. ā©
Posted on August 22, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.