Enhancing Task Scheduling Reliability: Integrating Arthas for API Monitoring in DolphinScheduler
Chen Debra
Posted on November 5, 2024
This article details the integration of Arthas into Apache DolphinScheduler to enable real-time monitoring of API calls. Arthas, a powerful Java diagnostic tool, assists developers in inspecting the runtime status, identifying performance bottlenecks, and tracking method calls. Embedding Arthas in DolphinScheduler allows for the capture of key call information during task scheduling, enabling timely issue detection and resolution for improved system stability. Here, we outline the steps to start Arthas within the DolphinScheduler environment, monitor specific API calls, and analyze the collected performance data to enhance scheduling reliability and maintainability.
Manual Installation
https://arthas.aliyun.com/download/latest_version?mirror=aliyun
arthas-packaging-3.7.2-bin.zip
cp arthas-packaging-3.7.2-bin.zip /opt/arthas
cd /opt/arthas
unzip arthas-packaging-3.7.2-bin.zip
java -jar arthas-boot.jar
Select the corresponding process ID.
Error Troubleshooting
Error 1
[ERROR] Start arthas failed, exception stack trace:
com.sun.tools.attach.AttachNotSupportedException: Unable to open socket file: target process not responding or HotSpot VM not loaded
at sun.tools.attach.LinuxVirtualMachine.<init>(LinuxVirtualMachine.java:106)
at sun.tools.attach.LinuxAttachProvider.attachVirtualMachine(LinuxAttachProvider.java:78)
at com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:250)
at com.taobao.arthas.core.Arthas.attachAgent(Arthas.java:102)
at com.taobao.arthas.core.Arthas.<init>(Arthas.java:27)
at com.taobao.arthas.core.Arthas.main(Arthas.java:161)
Solution:
In ${DOLPHINSCHEUDLER_HOME}/api-server/bin
, add the following line to jvm_args_env.sh
:
-XX:+StartAttachListener
Error 2
Picked up JAVA_TOOL_OPTIONS:
java.io.IOException: well-known file /tmp/.java_pid731688 is not secure: file should be owned by the current user (which is 0) but is owned by 989
at sun.tools.attach.LinuxVirtualMachine.checkPermissions(Native Method)
at sun.tools.attach.LinuxVirtualMachine.<init>(LinuxVirtualMachine.java:117)
at sun.tools.attach.LinuxAttachProvider.attachVirtualMachine(LinuxAttachProvider.java:78)
at com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:250)
at com.taobao.arthas.core.Arthas.attachAgent(Arthas.java:102)
at com.taobao.arthas.core.Arthas.<init>(Arthas.java:27)
at com.taobao.arthas.core.Arthas.main(Arthas.java:161)
[ERROR] Start arthas failed, exception stack trace:
[ERROR] attach fail, targetPid: 731688
Solution:
Ensure the user running the Arthas service matches the user running DolphinScheduler to avoid this error.
Watch
Watch
is used to monitor the specific execution details of methods, such as parameters and return values.
watch org.apache.dolphinscheduler.api.controller.UsersController queryUserList returnObj
[arthas@731688]$ watch org.apache.dolphinscheduler.api.controller.UsersController queryUserList returnObj
Press Q or Ctrl+C to abort.
Affect(class count: 1 , method count: 1) cost in 126 ms, listenerId: 2
method=org.apache.dolphinscheduler.api.controller.UsersController.queryUserList location=AtExit
ts=2024-08-27 02:04:01; [cost=4.918943ms] result=@Result[
...
Trace
Trace
monitors the depth of method calls, including the methods called and the execution time of each.
[arthas@973263]$ trace org.apache.dolphinscheduler.api.controller.UsersController queryUserList
Press Q or Ctrl+C to abort.
Affect(class count: 1 , method count: 1) cost in 319 ms, listenerId: 1
`---ts=2024-08-27 10:33:08;thread_name=qtp1836984213-26;id=26;is_daemon=false;priority=5;TCCL=sun.misc.Launcher$AppClassLoader@439f5b3d
`---[13.962731ms] org.apache.dolphinscheduler.api.controller.UsersController:queryUserList()
+---[0.18% 0.025123ms ] org.apache.dolphinscheduler.api.controller.UsersController:checkPageParams() #130
+---[0.09% 0.012549ms ] org.apache.dolphinscheduler.plugin.task.api.utils.ParameterUtils:handleEscapes() #131
`---[96.47% 13.469876ms ] org.apache.dolphinscheduler.api.service.UsersService:queryUserList() #132
Dump
To generate a heap dump file, use:
[arthas@973263]$ heapdump arthas-output/dump.hprof
Dumping heap to arthas-output/dump.hprof ...
Heap dump file created
Analyze the dump file with tools like MAT for memory leak diagnostics.
Viewing JVM Memory Changes
Use memory
to inspect JVM memory usage:
[arthas@973263]$ memory
Memory used total max usage
heap 485M 900M 900M 53.91%
ps_eden_space 277M 327M 358M 77.61%
...
Viewing CPU Usage
Use dashboard
to view CPU usage, and identify specific threads for further inspection with thread -n thread_id
.
Posted on November 5, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 5, 2024