Let’s Understand Chrome V8 — Chapter 13: String Ojbect Analysis
灰豆
Posted on August 29, 2022
Original source: https://medium.com/@huidou/lets-understand-chrome-v8-chapter-13-string-ojbect-analysis-f594b1cdfd48
Welcome to other chapters of Let’s Understand Chrome V8
A Javascript object is a collection of properties and some member methods. Use the “.” symbol to operate property values. We know that the string is not an object, so why does it have properties? Because V8 converts a string into an object first when operating properties of the string. The conversion is invisible to developers. But in V8, you can see the conversion detail and understand the source code of the String object.
In this paper, we’ll talk about the conversion from a string to a string object.
1. String variable
1.var s = "hello world!";
2.var word = s.substring(s.indexOf(" ")+1,s.length);
3.console.log(word)
Look at the above code, the string s is not an object, but when we call .indexOf(), V8 will convert s to a string object. Let’s talk about conversion from the following two questions.
(1) In the 1st line, let’s take a look at what type of the s is?
(2) In the 2nd line, when we call .indexOf, how does V8 convert it to an object?
Look at the code below, which is the bytecode data of our case.
1. [generated bytecode for function: (0x03d78d381bd1 <SharedFunctionInfo>)]
2. Parameter count 1
3. Register count 6
4. Frame size 48
5. 000003D78D381D06 @ 0 : 12 00 LdaConstant [0]
6. 000003D78D381D08 @ 2 : 26 fa Star r1
7. 000003D78D381D0A @ 4 : 0b LdaZero
8. 000003D78D381D0B @ 5 : 26 f9 Star r2
9. 000003D78D381D0D @ 7 : 27 fe f8 Mov <closure>, r3
10. 000003D78D381D10 @ 10 : 61 2d 01 fa 03 CallRuntime [DeclareGlobals], r1-r3
11. 0 E> 000003D78D381D15 @ 15 : a7 StackCheck
12. 8 S> 000003D78D381D16 @ 16 : 12 01 LdaConstant [1]
13. 8 E> 000003D78D381D18 @ 18 : 15 02 04 StaGlobal [2], [4]
14. 36 S> 000003D78D381D1B @ 21 : 13 02 00 LdaGlobal [2], [0]
15. 000003D78D381D1E @ 24 : 26 f9 Star r2
16. 38 E> 000003D78D381D20 @ 26 : 29 f9 03 LdaNamedPropertyNoFeedback r2, [3]
17. 000003D78D381D23 @ 29 : 26 fa Star r1
18. 48 E> 000003D78D381D25 @ 31 : 13 02 00 LdaGlobal [2], [0]
19. 000003D78D381D28 @ 34 : 26 f7 Star r4
20. 50 E> 000003D78D381D2A @ 36 : 29 f7 04 LdaNamedPropertyNoFeedback r4, [4]
21. 000003D78D381D2D @ 39 : 26 f8 Star r3
22. 000003D78D381D2F @ 41 : 12 05 LdaConstant [5]
23. 000003D78D381D31 @ 43 : 26 f6 Star r5
24. 50 E> 000003D78D381D33 @ 45 : 5f f8 f7 02 CallNoFeedback r3, r4-r5
25. 62 E> 000003D78D381D37 @ 49 : 40 01 06 AddSmi [1], [6]
26. 000003D78D381D3A @ 52 : 26 f8 Star r3
27. 65 E> 000003D78D381D3C @ 54 : 13 02 00 LdaGlobal [2], [0]
28. 000003D78D381D3F @ 57 : 26 f7 Star r4
29. 67 E> 000003D78D381D41 @ 59 : 29 f7 06 LdaNamedPropertyNoFeedback r4, [6]
30. 000003D78D381D44 @ 62 : 26 f7 Star r4
31. //....omit......
32. Constant pool (size = 10)
33. 000003D78D381C71: [FixedArray] in OldSpace
34. - map: 0x03ac45880169 <Map>
35. - length: 10
36. 0: 0x03d78d381c11 <FixedArray[8]>
37. 1: 0x03d78d381b69 <String[#12]: hello world!>
38. 2: 0x03d78d381b51 <String[#1]: s>
39. 3: 0x02ed266ea9e9 <String[#9]: substring>
40. 4: 0x02ed266e8121 <String[#7]: indexOf>
41. 5: 0x006d3e183c21 <String[#1]: >
42. 6: 0x03ac45884819 <String[#6]: length>
43. 7: 0x03ac45885301 <String[#4]: word>
44. 8: 0x02ed266f2441 <String[#7]: console>
45. 9: 0x02ed266f1a81 <String[#3]: log>
46. Handler Table (size = 0)
Before diving in, let’s take a look at a concept — constant pool(from lines 32 to 45), which is a string-type array generated during JavaScript compilation and used to store constants. The 37th line is the string hello world! and the 38th line code is the string s, these two lines are the variable s we are talking about.
Let’s look at lines 12 and 13, LdaConstant loads the constant-pool[1] into the accumulator, and StaGlobal uses the constant-pool[2] as the name to store the accumulator. What is the meaning of these two lines? They are declaring the string variable s(var s = “hello world!”) for our case.
Debug line 13 to find out how to store the value of variable s in detail. If we can see how to store s, we can see his type incidentally.
After Debugging, we find the following code will be called.
RUNTIME_FUNCTION(Runtime_StoreGlobalICNoFeedback_Miss) {
HandleScope scope(isolate);
DCHECK_EQ(2, args.length());
// Runtime functions don't follow the IC's calling convention.
Handle<Object> value = args.at(0);
Handle<Name> key = args.at<Name>(1);
// TODO(mythria): Replace StoreGlobalStrict/Sloppy with StoreNamed.
StoreGlobalIC ic(isolate, Handle<FeedbackVector>(), FeedbackSlot(),
FeedbackSlotKind::kStoreGlobalStrict);
RETURN_RESULT_OR_FAILURE(isolate, ic.Store(key, value));
}
//.................separation...................................
#define RETURN_RESULT_OR_FAILURE(isolate, call) \
do { \
Handle<Object> __result__; \
Isolate* __isolate__ = (isolate); \
if (!(call).ToHandle(&__result__)) { \
DCHECK(__isolate__->has_pending_exception()); \
return ReadOnlyRoots(__isolate__).exception(); \
} \
DCHECK(!__isolate__->has_pending_exception()); \
return *__result__; \
} while (false)
In the above code, the RETURN_RESULT_OR_FAILURE is a macro template. We go on with debugging the code and step into the WriteDataValue() which is below.
void LookupIterator::WriteDataValue(Handle<Object> value,
bool initializing_store) {
DCHECK_EQ(DATA, state_);
Handle<JSReceiver> holder = GetHolder<JSReceiver>();
if (IsElement()) {
Handle<JSObject> object = Handle<JSObject>::cast(holder);
ElementsAccessor* accessor = object->GetElementsAccessor(isolate_);
accessor->Set(object, number_, *value);
} else if (holder->HasFastProperties(isolate_)) {
if (property_details_.location() == kField) {
//omit.........................
In WriteDataValue, determine whether the parameter value is a global object. In our case, the value is our s apparently, which is a global object and meets the condition in Figure 1.
Look at the green marks in Figure 1, the value is our hello world!, and its type is ONE_BYTE_INTERNALIZED_STRING, which solves question 1 exactly.
2. String object
Let’s take a look at question 2, how to convert a string variable to an object.
Our case is calling .substring(). In the bytecode array of the case, lines 14 and 15 have stored the variable s in register r2, line 16 needs to load .substring() for r2, and the below bytecode LdaNamedPropertyNoFeedback is responsible for loading properties.
1. IGNITION_HANDLER(LdaNamedPropertyNoFeedback, InterpreterAssembler) {
2. TNode<Object> object = LoadRegisterAtOperandIndex(0);
3. TNode<Name> name = CAST(LoadConstantPoolEntryAtOperandIndex(1));
4. TNode<Context> context = GetContext();
5. TNode<Object> result =
6. CallBuiltin(Builtins::kGetProperty, context, object, name);
7. SetAccumulator(result);
8. CallRuntime(Runtime::kDebugPrint, context, object, name,result);//This is the debug command I wrote,@huidou
9. Dispatch();
10. }
11. //.........................separation.........................................
12. TF_BUILTIN(GetProperty, CodeStubAssembler) {
13. Node* object = Parameter(Descriptor::kObject);
14. Node* key = Parameter(Descriptor::kKey);
15. Node* context = Parameter(Descriptor::kContext);
16. // TODO(duongn): consider tailcalling to GetPropertyWithReceiver(object,
17. // object, key, OnNonExistent::kReturnUndefined).
18. Label if_notfound(this), if_proxy(this, Label::kDeferred),
19. if_slow(this, Label::kDeferred);
20. CodeStubAssembler::LookupInHolder lookup_property_in_holder =
21. [=](Node* receiver, Node* holder, Node* holder_map,
22. Node* holder_instance_type, Node* unique_name, Label* next_holder,
23. Label* if_bailout) {
24. VARIABLE(var_value, MachineRepresentation::kTagged);
25. Label if_found(this);
26. TryGetOwnProperty(context, receiver, holder, holder_map,
27. holder_instance_type, unique_name, &if_found,
28. &var_value, next_holder, if_bailout);
29. BIND(&if_found);
30. Return(var_value.value());
31. };
32. CodeStubAssembler::LookupInHolder lookup_element_in_holder =
33. [=](Node* receiver, Node* holder, Node* holder_map,
34. Node* holder_instance_type, Node* index, Label* next_holder,
35. Label* if_bailout) {
36. // Not supported yet.
37. Use(next_holder);
38. Goto(if_bailout);
39. };
40. TryPrototypeChainLookup(object, object, key, lookup_property_in_holder,
41. lookup_element_in_holder, &if_notfound, &if_slow,
42. &if_proxy);
43. BIND(&if_notfound);
44. Return(UndefinedConstant());
45. BIND(&if_slow);
46. TailCallRuntime(Runtime::kGetProperty, context, object, key);
47. BIND(&if_proxy);
48. {
49. // Convert the {key} to a Name first.
50. TNode<Object> name = CallBuiltin(Builtins::kToName, context, key);
51. // The {object} is a JSProxy instance, look up the {name} on it, passing
52. // {object} both as receiver and holder. If {name} is absent we can safely
53. // return undefined from here.
54. TailCallBuiltin(Builtins::kProxyGetProperty, context, object, name, object,
55. SmiConstant(OnNonExistent::kReturnUndefined));
56. }
57. }
In the process of debugging LdaNamedPropertyNoFeedback, we can only see the assembly code, but not the C++ source code. The bytecode is stored separately in the snapshot_blob file and loaded by V8 in the deserialized way without the symbol table, so we can’t see C++.
The 8th line of code is my additional CallRuntime, which is used to return to the C++ environment for watching variables.
Look at Figure 2. With the help of line 8, we can get back to the C++ environment. Here, we can only watch variables, not debug. We will return to the assembly environment if we do a step. Figure 3 is the call stack.
To sum up, in Figure 2, the args[0] is ‘hello world!’, which is the global variable s declared before. Note that its type is still ONE_BYTE_INTERNALIZED_STRING. Let’s look at args[2], its type is JS_FUNCTION, which is the address of the substring() method.
At this point, the . operation is completed, and s is still a string. Actually, it doesn’t prevent V8 from calling the substring and returning the value. This is the implementation style of V8, the engineering method and technical principle may be dissimilar.
3. Summary
- In v8, the String is subdivided into many subtypes which are below.
switch (map.instance_type()) {
case CONS_STRING_TYPE:
case CONS_ONE_BYTE_STRING_TYPE:
case THIN_STRING_TYPE:
case THIN_ONE_BYTE_STRING_TYPE:
case SLICED_STRING_TYPE:
case SLICED_ONE_BYTE_STRING_TYPE:
case EXTERNAL_STRING_TYPE:
case EXTERNAL_ONE_BYTE_STRING_TYPE:
case UNCACHED_EXTERNAL_STRING_TYPE:
case UNCACHED_EXTERNAL_ONE_BYTE_STRING_TYPE:
case STRING_TYPE:
case ONE_BYTE_STRING_TYPE:
//omit........................
When using .substring(), the variable s is still a string. In the V8 implementation, the string is a heap-managed object, but as I’ve mentioned, its type is the string.
Below is the function I use to debug a bytecode, it is Runtime, and the usage is CallRuntime(Runtime::kDebugPrint, context, your args0, your args1….). You can see how I use it in the code above.
1. RUNTIME_FUNCTION(Runtime_DebugPrint) {
2. SealHandleScope shs(isolate);
3. //DCHECK_EQ(1, args.length());
4. MaybeObject maybe_object(*args.address_of_arg_at(0));
5. StdoutStream os;
6. if (maybe_object->IsCleared()) {
7. os << "[weak cleared]";
8. } else {
9. Object object = maybe_object.GetHeapObjectOrSmi();
10. bool weak = maybe_object.IsWeak();
11. #ifdef DEBUG
12. if (object.IsString() && !isolate->context().is_null()) {
13. DCHECK(!weak);
14. // If we have a string, assume it's a code "marker"
15. // and print some interesting cpu debugging info.
16. object.Print(os);
17. JavaScriptFrameIterator it(isolate);
18. JavaScriptFrame* frame = it.frame();
19. os << "fp = " << reinterpret_cast<void*>(frame->fp())
20. << ", sp = " << reinterpret_cast<void*>(frame->sp())
21. << ", caller_sp = " << reinterpret_cast<void*>(frame->caller_sp())
22. << ": ";
23. } else {
24. os << "DebugPrint: ";
25. if (weak) {
26. os << "[weak] ";
27. }
28. object.Print(os);
29. }
30. if (object.IsHeapObject()) {
31. HeapObject::cast(object).map().Print(os);
32. }
33. #else
34. if (weak) {
35. os << "[weak] ";
36. }
37. // ShortPrint is available in release mode. Print is not.
38. os << Brief(object);
39. #endif
40. }
41. os << std::endl;
42. return args[0]; // return TOS
43. }
By the way, you can also use the function in your JavaScript source via the allow-natives-syntax option.
Okay, that wraps it up for this share. I’ll see you guys next time, take care!
Please reach out to me if you have any issues. WeChat: qq9123013 Email: v8blink@outlook.com
Posted on August 29, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 29, 2024