JavaScript, Ruby and C are not call by reference
Derk-Jan Karrenbeld
Posted on June 27, 2019
🛑 This article is a response to various articles in the wild which state that JavaScript and Ruby are "Call/Pass by reference" for objects and "Call/Pass by value" for primitives.
Many of these articles provide a lot of valuable information and this article is not to unequivically say that those articles should not have been written or are useless. Instead, this article attempts to explore the semantic, yet pedantic, meanings and definitions of
- call by reference
- pass a reference
- reference type
- reference
First, I would like to make a few statements, after which Ill try to explore what these statements actually mean and why I've made them, contrary to various articles in the wild.
☕ When you see this emoji (☕), I try to give a non-code analogy to help you better understand what's going on. These abstractions are pretty leaky and might not hold up, but they're only meant in the context of the paragraphs that surround them. Take them with a grain of salt.
Statements
- JavaScript is always call by value.
- Ruby is always call by value.
- C is always call by value.
- The terminology is confusing and perhaps even flawed.
- The terminology only applies to function (procedure) parameters.
- Pointers are an implementation detail and their presence don't say anything about the evaluation of function parameters.
History and Definitions
I've tried to look up the origins of the terms as mentioned above, and there is quite a bit of literature out there from the earlier programming languages.
The Main Features of CPL (D. W. Barron et al., 1963):
Three modes of parameter call are possible; call by value (which is equivalent to the ALGOL call by value), call by substitution (equivalent to ALGOL call by name), and call by reference. In the latter case, the LH value of the actual parameter is handed over; this corresponds to the "call by simple name" suggested by Strachey and Wilkes (1961).
It is important to note that here the literature talks about mode of parameter call. It further distinguishes three modes: call by value
, call by name
and call by reference
.
Further literature gives a good, yet technical, definition of these three and a fourth strategy (namely copy restore
), as published in the Semantic Models of Parameter Passing (Richard E. Fairly, 1973). I've quoted 2 of the 4 definitions below, after which I'll break them down and explain what they mean in more visual terms.
Call by Value
[...] Call by Value parameter requires that the actual parameter be evaluated at the time of the procedure call. The memory register associated with the formal parameter is then initialised to this value, and references to the formal parameter in the procedure body are treated as references to the local memory register in which the initial value of the actual parameter was stored. Due to the fact that a copy of the value associated with the actual parameter is copied into the local memory register, transformations on the parameter value within the procedure body are isolated from the actual parameter value. Because of this isolation of values, Call by value can not be used to communicate calculated values back to the calling program.
Roughly, this means that a parameter is, before the function (procedure
) is called, completely evaluated. The resulting value (from that evaluation), is then assigned to the identifier inside the function (formal parameter
). In many programming languages this is done by copying the value to a second memory address, making the changes inside the function (procedure body
) isolated to that function.
In other words: the original memory address' contents (the one used to store the evaluated expression before passing it into the function) can not be changed by code inside the function and changes inside the function to the value are not propagated to the caller.
☕ When you order a coffee and someone asks for your name, they might write it down incorrectly. This doesn't affect your actual name and the change is only propagated to the cup.
Call by Reference
[...] In Call by Reference, the address (name) of the actual parameter at the time of the procedure call is passed to the procedure as the value to be associated with the corresponding formal parameter. References to the formal parameter in the procedure body result in indirect addressing references through the formal parameter register to the memory register associated with the actual parameter in the calling procedure. Thus, transformations of formal parameter values are immediately transmitted to the calling procedure, because both the actual parameter and the formal parameter refer to the same register.
Roughly, this means that, just like before, the parameter is evaluated, but, unlike before, the memory address (address
/ name
) is passed to the function (procedure
). Changes made to the parameter inside the function (formal parameter
) are actually made on the memory address and therefore propagate back to the caller.
☕ When you go to a support store for one of your hardware devices and ask for it to be fixed, they might give you a replacement device. This replacement device is still yours, you own it just like before, but it might not be the exact same one you gave to be fixed.
Reference (and value) types
This is not the complete picture. There is one vital part left that causes most of the confusion. Right now I'll explain what a reference type is, which has nothing to do with arguments/parameters or function calls.
Reference types and value types are usually explained in the context of how a programming language stores values inside the memory, which also explains why some languages choose to have both, but this entire concept is worthy of (a series of) articles on its own. The Wikipedia page is, in my opinion, not very informative, but it does refer to various language specs that do go into technical detail.
A data type is a value type if it holds a data value within its own memory space. It means variables of these data types directly contain their values.
Unlike value types, a reference type doesn't store its value directly. Instead, it stores the address where the value is being stored.
In short, a reference type is a type that points to a value somewhere in memory whereas a value type is a type that directly points to its value.
☕ When you make a payment online, and enter your bank account number details, for example your card number, the card itself can not be changed. However, the bank account's balance will be affected. You can see your card as a reference to your balance (and multiple cards can all reference the same balance).
☕ When you pay offline, that is with cash, the money leaves your wallet. Your wallet holds its own value, just like the cash inside your wallet. The value is directly where the wallet/cash is.
Show me the code proof
function reference_assignment(myRefMaybe) {
myRefMaybe = { key: 42 }
}
var primitiveValue = 1
var someObject = { is: 'changed?' }
reference_assignment(primitiveValue)
primitiveValue
// => 1
reference_assignment(someObject)
// => { is: 'changed?' }
As shown above, someObject
has not been changed, because it was not a reference
to someObject
. In terms of the definitions before: it was not the memory
address of someObject
that was passed, but a copy.
A language that does support pass by reference
is PHP, but it requires special syntax to change from the default of passing by value:
function change_reference_value(&$actually_a_reference)
{
$actually_a_reference = $actually_a_reference + 1;
}
$value = 41;
change_reference_value($value);
// => $value equals 42
I tried to keep the same sort of semantic as the JS code.
As you can see, the PHP example actually changes the value the input argument refers to. This is because the memory address of $value
can be accessed by the parameter $actually_a_reference
.
What's wrong with the nomenclature?
Reference types and "boxed values" make this more confusing and also why I believe that the nomenclature is perhaps flawed.
The term call-by-value
is problematic. In JavaScript and Ruby, the value that is passed is a reference. That means that, indeed, the reference to the boxed primitive is copied, and therefore changing a primitive inside a function doesn't affect the primitive on the outside. That also means that, indeed, the reference to a reference type, such as an Array
or Object
, is copied and passed as the value.
Because reference types refer to their value, copying a reference type makes the copy still refer to that value. This is also what you experience as shallow copy instead of deep copy/clone.
Whoah. Okay. Here is an example that explores both these concepts:
function appendOne(list) {
list.push(1)
}
function replaceWithFive(list) {
list = [5]
}
const first = []
const second = []
appendOne(first)
first
// => [1]
replaceWithFive(second)
second
// => []
In the first example it outputs [1]
, because the push
method modifies the object on which it is called (the object is referenced from the name list
). This propagates because the list
argument still refers to the original object first
(its reference was copied and passed as a value. list
points to that copy, but points to the same data in memory, because Object
is a reference type).
In the second example it outputs []
because the re-assignment doesn't propagate to the caller. In the end it is not re-assigning the original reference but only a copy.
Here is another way to write this down. 👉🏽 indicates a reference to a different location in memory.
first_array = []
second_array = []
first = 👉🏽 first_array
list = copy(first) = 👉🏽 first_array
list.push = (👉🏽 first_array).push(...)
// => (👉🏽 first_array) was changed
second = 👉🏽 second_array
list = copy(second) = 👉🏽 second_array
replace_array = []
list = 👉🏽 replace_array
// => (👉🏽 second_array) was not changed
What about pointers?
C is also always pass by value / call by value, but it allows you to pass a pointer which can simulate pass by reference. Pointers are implementation details, and for example used in C# to enable pass by reference.
In C, however, pointers are reference types! The syntax *pointer
allows you to follow the pointer to its reference. In the comments in this code I tried to explain what is going on under the hood.
void modifyParameters(int value, int* pointerA, int* pointerB) {
// passed by value: only the local parameter is modified
value = 42;
// passed by value or "reference", check call site to determine which
*pointerA = 42;
// passed by value or "reference", check call site to determine which
*pointerB = 42;
}
int main() {
int first = 1;
int second = 2;
int random = 100;
int* third = &random;
// "first" is passed by value, which is the default
// "second" is passed by reference by creating a pointer,
// the pointer is passed by value, but it is followed when
// using *pointerA, and thus this is like passing a reference.
// "third" is passed by value. However, it's a pointer and that pointer
// is followed when using *pointerB, and thus this is like
// passing a reference.
modifyParameters(first, &second, third);
// "first" is still 1
// "second" is now 42
// "random" is now 42
// "third" is still a pointer to "random" (unchanged)
return 0;
}
Call by sharing?
The lesser used and known term that was coined is Call by sharing which applies to Ruby, JavaScript, Python, Java and so forth. It implies that all values are object, all values are boxed, and they copy a reference when they pass it as value. Unfortunately, in literature, the usage of this concept is not consistent, which is also why it's probably less known or used.
For the purpose of this article, call-by-sharing is call by value
, but the value is always a reference.
Conclusion
In short: It's always pass by value, but the value of the variable is a reference. All primitive-methods return a new value and thus one can not modify it, all objects and arrays can have methods that modified their value, and thus one can modify it.
You can not affect the memory address of the parameter directly in the languages that use call-by-value
, but you may affect what the parameter refers to. That is, you may affect the memory the parameter points to.
The statement Primitive Data Types are passed By Value and Objects are passed By Reference. is incorrect.
Posted on June 27, 2019
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.