Enhancing Task Scheduling Reliability: Integrating Arthas for API Monitoring in DolphinScheduler

chen_debra_3060b21d12b1b0

Chen Debra

Posted on November 5, 2024

Enhancing Task Scheduling Reliability: Integrating Arthas for API Monitoring in DolphinScheduler

This article details the integration of Arthas into Apache DolphinScheduler to enable real-time monitoring of API calls. Arthas, a powerful Java diagnostic tool, assists developers in inspecting the runtime status, identifying performance bottlenecks, and tracking method calls. Embedding Arthas in DolphinScheduler allows for the capture of key call information during task scheduling, enabling timely issue detection and resolution for improved system stability. Here, we outline the steps to start Arthas within the DolphinScheduler environment, monitor specific API calls, and analyze the collected performance data to enhance scheduling reliability and maintainability.

Manual Installation

https://arthas.aliyun.com/download/latest_version?mirror=aliyun
arthas-packaging-3.7.2-bin.zip

cp arthas-packaging-3.7.2-bin.zip /opt/arthas
cd /opt/arthas
unzip arthas-packaging-3.7.2-bin.zip

java -jar arthas-boot.jar

Select the corresponding process ID.
Enter fullscreen mode Exit fullscreen mode

Error Troubleshooting

Error 1

[ERROR] Start arthas failed, exception stack trace: 
com.sun.tools.attach.AttachNotSupportedException: Unable to open socket file: target process not responding or HotSpot VM not loaded
        at sun.tools.attach.LinuxVirtualMachine.<init>(LinuxVirtualMachine.java:106)
        at sun.tools.attach.LinuxAttachProvider.attachVirtualMachine(LinuxAttachProvider.java:78)
        at com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:250)
        at com.taobao.arthas.core.Arthas.attachAgent(Arthas.java:102)
        at com.taobao.arthas.core.Arthas.<init>(Arthas.java:27)
        at com.taobao.arthas.core.Arthas.main(Arthas.java:161)
Enter fullscreen mode Exit fullscreen mode

Solution:
In ${DOLPHINSCHEUDLER_HOME}/api-server/bin, add the following line to jvm_args_env.sh:

-XX:+StartAttachListener
Enter fullscreen mode Exit fullscreen mode

Error 2

Picked up JAVA_TOOL_OPTIONS: 
java.io.IOException: well-known file /tmp/.java_pid731688 is not secure: file should be owned by the current user (which is 0) but is owned by 989
        at sun.tools.attach.LinuxVirtualMachine.checkPermissions(Native Method)
        at sun.tools.attach.LinuxVirtualMachine.<init>(LinuxVirtualMachine.java:117)
        at sun.tools.attach.LinuxAttachProvider.attachVirtualMachine(LinuxAttachProvider.java:78)
        at com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:250)
        at com.taobao.arthas.core.Arthas.attachAgent(Arthas.java:102)
        at com.taobao.arthas.core.Arthas.<init>(Arthas.java:27)
        at com.taobao.arthas.core.Arthas.main(Arthas.java:161)
[ERROR] Start arthas failed, exception stack trace: 
[ERROR] attach fail, targetPid: 731688
Enter fullscreen mode Exit fullscreen mode

Solution:
Ensure the user running the Arthas service matches the user running DolphinScheduler to avoid this error.

Watch

Watch is used to monitor the specific execution details of methods, such as parameters and return values.

watch org.apache.dolphinscheduler.api.controller.UsersController queryUserList returnObj
Enter fullscreen mode Exit fullscreen mode
[arthas@731688]$ watch org.apache.dolphinscheduler.api.controller.UsersController queryUserList returnObj
Press Q or Ctrl+C to abort.
Affect(class count: 1 , method count: 1) cost in 126 ms, listenerId: 2
method=org.apache.dolphinscheduler.api.controller.UsersController.queryUserList location=AtExit
ts=2024-08-27 02:04:01; [cost=4.918943ms] result=@Result[
...
Enter fullscreen mode Exit fullscreen mode

Trace

Trace monitors the depth of method calls, including the methods called and the execution time of each.

[arthas@973263]$ trace org.apache.dolphinscheduler.api.controller.UsersController queryUserList 
Press Q or Ctrl+C to abort.
Affect(class count: 1 , method count: 1) cost in 319 ms, listenerId: 1
`---ts=2024-08-27 10:33:08;thread_name=qtp1836984213-26;id=26;is_daemon=false;priority=5;TCCL=sun.misc.Launcher$AppClassLoader@439f5b3d
    `---[13.962731ms] org.apache.dolphinscheduler.api.controller.UsersController:queryUserList()
        +---[0.18% 0.025123ms ] org.apache.dolphinscheduler.api.controller.UsersController:checkPageParams() #130
        +---[0.09% 0.012549ms ] org.apache.dolphinscheduler.plugin.task.api.utils.ParameterUtils:handleEscapes() #131
        `---[96.47% 13.469876ms ] org.apache.dolphinscheduler.api.service.UsersService:queryUserList() #132
Enter fullscreen mode Exit fullscreen mode

Dump

To generate a heap dump file, use:

[arthas@973263]$ heapdump arthas-output/dump.hprof
Dumping heap to arthas-output/dump.hprof ...
Heap dump file created
Enter fullscreen mode Exit fullscreen mode

Analyze the dump file with tools like MAT for memory leak diagnostics.

Viewing JVM Memory Changes

Use memory to inspect JVM memory usage:

[arthas@973263]$ memory 
Memory                                                         used                 total                max                  usage                
heap                                                           485M                 900M                 900M                 53.91%               
ps_eden_space                                                  277M                 327M                 358M                 77.61%               
...
Enter fullscreen mode Exit fullscreen mode

Viewing CPU Usage

Use dashboard to view CPU usage, and identify specific threads for further inspection with thread -n thread_id.

Image description

💖 💪 🙅 🚩
chen_debra_3060b21d12b1b0
Chen Debra

Posted on November 5, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related