SPO600 – Final Project - Step 1

xguhx

Gustavo Tavares

Posted on March 28, 2022

SPO600 – Final Project - Step 1

Hello!

We are getting closer to an end here and we are finally starting our final Project (We will be working in the Open!).


Step 1

For the first step of the project, we were supposed to choose some packages that would beneficiate from sve2 instructions.

The ideal package is one that process massive amounts of data. This way the sve2 can be used at its maximum capabilities to improve performance.

After some research, I found two candidates that could benefit from sve2:
Gstreamer1 and FFmpeg.


Gstream1

According to them:

 GStreamer is a streaming media framework, based on graphs of filters which operate on media data. 

Applications using this library can do anything from real-time sound processing to playing videos, and just about anything else media-related.  

Its plugin-based architecture means that new data types or processing capabilities can be added simply by installing new plugins.
Enter fullscreen mode Exit fullscreen mode

Gstream1 use inline assembler code for many functions, as we can see here:

static inline void
inner_product_gint16_full_1_neon (gint16 * o, const gint16 * a,
    const gint16 * b, gint len, const gint16 * icoeff, gint bstride)
{
    uint32_t remainder = len % 16;
    len = len - remainder;

    asm volatile ("      vmov.s32 q0, #0\n"
                  "      cmp %[len], #0\n"
                  "      beq 2f\n"
                  "      vmov.s32 q1, #0\n"
                  "1:"
                  "      vld1.16 {d16, d17, d18, d19}, [%[b]]!\n"
                  "      vld1.16 {d20, d21, d22, d23}, [%[a]]!\n"
                  "      subs %[len], %[len], #16\n"
                  "      vmlal.s16 q0, d16, d20\n"
                  "      vmlal.s16 q1, d17, d21\n"
                  "      vmlal.s16 q0, d18, d22\n"
                  "      vmlal.s16 q1, d19, d23\n"
                  "      bne 1b\n"
                  "      vadd.s32 q0, q0, q1\n"
                  "2:"
                  "      cmp %[remainder], #0\n"
                  "      beq 4f\n"
                  "3:"
                  "      vld1.16 {d16}, [%[b]]!\n"
                  "      vld1.16 {d20}, [%[a]]!\n"
                  "      subs %[remainder], %[remainder], #4\n"
                  "      vmlal.s16 q0, d16, d20\n"
                  "      bgt 3b\n"
                  "4:"
                  "      vadd.s32 d0, d0, d1\n"
                  "      vpadd.s32 d0, d0, d0\n"
                  "      vqrshrn.s32 d0, q0, #15\n"
                  "      vst1.16 d0[0], [%[o]]\n"
                  : [a] "+r" (a), [b] "+r" (b),
                    [len] "+r" (len), [remainder] "+r" (remainder)
                  : [o] "r" (o)
                  : "cc", "q0", "q1",
                    "d16", "d17", "d18", "d19",
                    "d20", "d21", "d22", "d23");
}


Enter fullscreen mode Exit fullscreen mode

FFmpeg

According to them:

FFmpeg is a complete and free Internet live audio and video broadcasting solution for Linux/Unix. It also includes a digital  VCR. It can encode in real time in many formats including MPEG1 audio and video, MPEG4, h263, ac3, asf, avi, real, mjpeg, and flash.
Enter fullscreen mode Exit fullscreen mode

The Second option has many files that use neon and use gcc to compile.

FFmpeg


Planning my approach

Auto-vectorization:
My plan for implementing sve2 in this project is to use the auto-vectorization, this means I will change the Makefile and include options that will make the compiler applies the optimizations for me.
Here is a exemple of Makefile from gstreamer1:
Makefile

As we can see they are using -O0 for optimizations which means no optimizations at all.

So I will include the options -O3 -march=armv8-a+sve2 and test it to see if the improvements were made.


About Makefiles

Makefiles can be complicated, I thought this would be an easy approach first but there is so many Makefiles in a project and they are linked to each other.
Take a look at this example from FFmpeg:

MAIN_MAKEFILE=1
include ffbuild/config.mak

vpath %.c    $(SRC_PATH)
vpath %.cpp  $(SRC_PATH)
vpath %.h    $(SRC_PATH)
vpath %.inc  $(SRC_PATH)
vpath %.m    $(SRC_PATH)
vpath %.S    $(SRC_PATH)
vpath %.asm  $(SRC_PATH)
vpath %.rc   $(SRC_PATH)
vpath %.v    $(SRC_PATH)
vpath %.texi $(SRC_PATH)
vpath %.cu   $(SRC_PATH)
vpath %.ptx  $(SRC_PATH)
vpath %.metal $(SRC_PATH)
vpath %/fate_config.sh.template $(SRC_PATH)

TESTTOOLS   = audiogen videogen rotozoom tiny_psnr tiny_ssim base64 audiomatch
HOSTPROGS  := $(TESTTOOLS:%=tests/%) doc/print_options

# $(FFLIBS-yes) needs to be in linking order
FFLIBS-$(CONFIG_AVDEVICE)   += avdevice
FFLIBS-$(CONFIG_AVFILTER)   += avfilter
FFLIBS-$(CONFIG_AVFORMAT)   += avformat
FFLIBS-$(CONFIG_AVCODEC)    += avcodec
FFLIBS-$(CONFIG_POSTPROC)   += postproc
FFLIBS-$(CONFIG_SWRESAMPLE) += swresample
FFLIBS-$(CONFIG_SWSCALE)    += swscale

FFLIBS := avutil

DATA_FILES := $(wildcard $(SRC_PATH)/presets/*.ffpreset) $(SRC_PATH)/doc/ffprobe.xsd

SKIPHEADERS = compat/w32pthreads.h

# first so "all" becomes default target
all: all-yes

include $(SRC_PATH)/tools/Makefile
include $(SRC_PATH)/ffbuild/common.mak

FF_EXTRALIBS := $(FFEXTRALIBS)
FF_DEP_LIBS  := $(DEP_LIBS)
FF_STATIC_DEP_LIBS := $(STATIC_DEP_LIBS)

$(TOOLS): %$(EXESUF): %.o
        $(LD) $(LDFLAGS) $(LDEXEFLAGS) $(LD_O) $^ $(EXTRALIBS-$(*F)) $(EXTRALIBS) $(ELIBS)

target_dec_%_fuzzer$(EXESUF): target_dec_%_fuzzer.o $(FF_DEP_LIBS)
target_dec_%_fuzzer$(EXESUF): target_dec_%_fuzzer.o $(FF_DEP_LIBS)
        $(LD) $(LDFLAGS) $(LDEXEFLAGS) $(LD_O) $^ $(ELIBS) $(FF_EXTRALIBS) $(LIBFUZZER_PATH)

tools/target_bsf_%_fuzzer$(EXESUF): tools/target_bsf_%_fuzzer.o $(FF_DEP_LIBS)
        $(LD) $(LDFLAGS) $(LDEXEFLAGS) $(LD_O) $^ $(ELIBS) $(FF_EXTRALIBS) $(LIBFUZZER_PATH)

target_dem_%_fuzzer$(EXESUF): target_dem_%_fuzzer.o $(FF_DEP_LIBS)
        $(LD) $(LDFLAGS) $(LDEXEFLAGS) $(LD_O) $^ $(ELIBS) $(FF_EXTRALIBS) $(LIBFUZZER_PATH)

tools/target_dem_fuzzer$(EXESUF): tools/target_dem_fuzzer.o $(FF_DEP_LIBS)
        $(LD) $(LDFLAGS) $(LDEXEFLAGS) $(LD_O) $^ $(ELIBS) $(FF_EXTRALIBS) $(LIBFUZZER_PATH)

tools/target_io_dem_fuzzer$(EXESUF): tools/target_io_dem_fuzzer.o $(FF_DEP_LIBS)
        $(LD) $(LDFLAGS) $(LDEXEFLAGS) $(LD_O) $^ $(ELIBS) $(FF_EXTRALIBS) $(LIBFUZZER_PATH)

 
(it keeps going and going)
Enter fullscreen mode Exit fullscreen mode

Looks like some kind of Martian language to me.


Finally

After some research I decided to go with ffmpeg.
Because gstream1 don't use make and Makefiles to compile, it uses meson and ninja which makes life difficult for me as I have no knowledge at all in those technologies.

To change the ffmpeg Makefile, I will have to change the 'config.mak' file inside the ffbuild directory as it send configurations to the Makefile which builds the project.

Thats it for now!
Thank you for reading!

💖 💪 🙅 🚩
xguhx
Gustavo Tavares

Posted on March 28, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related