Building sync engines in WordPress

artpi

Artur Piszek

Posted on June 21, 2024

Building sync engines in WordPress

Image description

Let’s say you want to synchronize some data in your WordPress with an external service. How do you do that? For my little “Second Brain” WordPress plugin, I have implemented 2 sync services so far:

  • Readwise
  • Evernote

Both of them synchronize data with a custom post type called “notes“. Here is what I learned

WP-Cron

All sync software has some kind of background service that looks for changes and syncs the detected ones. We do not have operating system access in WordPress to fire an actual process, but we have wp-cron.

WP-Cron is a job management system designed for periodic or scheduled maintenance actions that need to be performed: like publishing scheduled posts, checking plugin updates etc.

So here is what we are going to do:

  1. We are going to create an hourly cron job for our sync.
  2. Each job run will work through a batch – a limited number of elements (I gravitated towards 100 of updates)
  3. Whenever this job finishes a run, it will check if more updates are waiting to sync. If so,
    • It will cancel the next run (in one hour)
    • Schedule itself in 1 minute
    • Unschedule any “regular” – hourly sync events so that at any time, there is only one sync event waiting

We are doing all these because:

  • We don’t want any of these runs to be too compute and memory-intensive. WordPress is running on a variety of hosts with different limitations, and we don’t want our run to be killed
  • We don’t know how many updates in total we are expecting. I have 8000+ notes in my Evernote account, so the initial sync with Evernote will take quite some time. That’s why we redo the job every minute if there are pending updates
  • We don’t want to poll any external service too often because rate limits are a pain to manage

New cron job

I have a little abstraction layer of modules in my code, but essentially, each service has its own module.

class External_Service_Module extends POS_Module {
    public $id = 'external_service';
    public $name = 'External Service';

    public function get_sync_hook_name() {
        return 'pos_sync_' . $this->id;
    }

    public function register_sync( $interval = 'hourly' ) {
        $hook_name = $this->get_sync_hook_name();
        add_action( $hook_name, array( $this, 'sync' ) );
        if ( ! wp_next_scheduled( $hook_name ) ) {
            wp_schedule_event( time(), $interval, $hook_name );
        }
    }

    public function sync() {
        $this->log( 'EMPTY SYNC' );
    }
}
Enter fullscreen mode Exit fullscreen mode

So here is a simplified sync method for my Evernote module. You can see the full code here.

class Evernote_Module extends External_Service_Module {
    public $id = 'evernote';
    public $name = 'Evernote';
    public $description = 'Syncs with evernote service';
    public $advanced_client = null;

    public function __construct( \POS_Module $notes_module ) {

        $this->register_sync( 'hourly' );
    }

    /**
     * Sync with Evernote. This is triggered by the cron job.
     *
     * @see register_sync
     */
    public function sync() {
        $this->log( 'Syncing Evernote triggering ' );
        $usn = get_option( $this->get_setting_option_name( 'usn' ), 0 );

        $sync_chunk = $this->advanced_client->getNoteStore()->getFilteredSyncChunk( $usn, 100, $sync_filter );
        if ( ! $sync_chunk ) {
            $this->log( 'Evernote: Sync failed', E_USER_WARNING );
            return;
        }
        if ( ! empty( $sync_chunk->chunkHighUSN ) && $sync_chunk->chunkHighUSN > $usn ) {
            // We want to unschedule any regular sync events until we have initial sync complete.
            wp_unschedule_hook( $this->get_sync_hook_name() );
            // We will schedule ONE TIME sync event for the next page.
            update_option( $this->get_setting_option_name( 'usn' ), $sync_chunk->chunkHighUSN );
            wp_schedule_single_event( time() + 60, $this->get_sync_hook_name() );
            $this->log( "Scheduling next page chunk with cursor {$sync_chunk->chunkHighUSN}" );
        } else {
            $this->log( 'Evernote: Full sync completed' );
        }
        // ACTUALLY PROCESS ITEMS HERE
    }

}
Enter fullscreen mode Exit fullscreen mode

I have a very similar sync code for Readwise:

<?php

class Readwise extends External_Service_Module {
    public $id = 'readwise';
    public $name = 'Readwise';
    public $description = 'Syncs with readwise service';

    public function __construct( $notes_module ) {
        $this->register_sync( 'hourly' );
    }

    public function sync() {
        $this->log( '[DEBUG] Syncing readwise triggering ' );

        $query_args = array();
        $page_cursor = get_option( $this->get_setting_option_name( 'page_cursor' ), null );
        if ( $page_cursor ) {
            $query_args['pageCursor'] = $page_cursor;
        } else {
            $last_sync = get_option( $this->get_setting_option_name( 'last_sync' ) );
            if ( $last_sync ) {
                $query_args['updatedAfter'] = $last_sync;
            }
        }

        $request = wp_remote_get(
            'https://readwise.io/api/v2/export/?' . http_build_query( $query_args ),
            array(
                'headers' => array(
                    'Authorization' => 'Token ' . $token,
                ),
            )
        );
        if ( is_wp_error( $request ) ) {
            $this->log( '[ERROR] Fetching readwise ' . $request->get_error_message(), E_USER_WARNING );
            return false; // Bail early
        }

        $body = wp_remote_retrieve_body( $request );
        $data = json_decode( $body );
        $this->log( "[DEBUG] Readwise Syncing {$data->count} highlights" );

        //phpcs:ignore WordPress.NamingConventions.ValidVariableName.UsedPropertyNotSnakeCase
        if ( ! empty( $data->nextPageCursor ) ) {
            // We want to unschedule any regular sync events until we have initial sync complete.
            wp_unschedule_hook( $this->get_sync_hook_name() );
            // We will schedule ONE TIME sync event for the next page.
            //phpcs:ignore WordPress.NamingConventions.ValidVariableName.UsedPropertyNotSnakeCase
            update_option( $this->get_setting_option_name( 'page_cursor' ), $data->nextPageCursor );
            wp_schedule_single_event( time() + 60, $this->get_sync_hook_name() );
            $this->log( "Scheduling next page sync with cursor {$data->nextPageCursor}" );
        } else {
            $this->log( '[DEBUG] Full sync completed' );
            update_option( $this->get_setting_option_name( 'last_sync' ), gmdate( 'c' ) );
            delete_option( $this->get_setting_option_name( 'page_cursor' ) );
        }

        foreach ( $data->results as $book ) {
            $this->sync_book( $book );
        }
    }

}

Enter fullscreen mode Exit fullscreen mode

Two-way sync

In the case of Evernote, we can have a two-way sync. When you update something in Evernote, the above job will trickle the updates to WordPress. If you update something in WordPress, it should reflect in Evernote.

The main trap is an update loop. We want to avoid a loop where you update something in WordPress, Evernote picks up the change and updates there, so WordPress gets the notification, updates in Evernote…

So here is what we are going to do:

  1. We are going to hook into save_post_ hook. This means we don’t have to really do sync. We assume WordPress is online, which means that we can push updates synchronously when they happen, not after some time, hopefully reducing conflicts.
  2. We are going to update the edited post with the returned Evernote content immediately on save. That way:
    • Even if Evernote changes the data on their end, we are sure that after post saving the data is similar on both ends
    • We can calculate the bodyHash and other indicators to know if data got out of sync
  3. Since we are going to update WordPress end again after pushing data to Evernote, we have to remember to unhook save_post_ hook to prevent recursion.
  4. This is not relevant to WordPress, but because Evernote has a limited syntax, we have to strip out and convert some data.

Here is the simplified code. You can read the whole thing here.

/**
* This is hooked into the save_post action of the notes module.
* Every time a post is updated, this will check if it is in the synced notebooks and sync it to evernote.
* It will then receive the returned content and update the post, so some content may be lost if it is not handled by evernote
*
* @param int $post_id
* @param \WP_Post $post
* @param bool $update
*/
public function sync_note_to_evernote( int $post_id, \WP_Post $post, bool $update ) {
    $guid = get_post_meta( $post->ID, 'evernote_guid', true );

    if ( $guid ) {
        $note = $this->advanced_client->getNoteStore()->getNote( $guid, false, false, false, false );
        if ( $note ) {
            $note->title = $post->post_title;
            $note->content = self::html2enml( $post->post_content );
            $result = $this->advanced_client->getNoteStore()->updateNote( $note );
        }
        return;
    }

    // edam note
    $note = new \EDAM\Types\Note();
    $note->title = $post->post_title;
    $note->content = self::html2enml( $post->post_content );

    $result = $this->advanced_client->getNoteStore()->createNote( $note );
    if ( $result ) {
        $this->update_note_from_evernote( $result, $post );
    }

}

/**
* This is called when a note is updated from evernote.
* It will update the post with the new data.
* It is triggered by both directions of the sync:
* - When a note is updated in evernote, it will be updated in WordPress
* - When a note is updated in WordPress, it will be updated in evernote and then the return will be passed here.
*
* @param \EDAM\Types\Note $note
* @param \WP_Post $post
* @param bool $sync_resources - If true, it will also upload note resources. We want this in most cases, EXCEPT when we are sending the data from WordPress and know the response will not have new resources for us.
*/
public function update_note_from_evernote( \EDAM\Types\Note $note, \WP_Post $post, $sync_resources = false ): int {
    remove_action( 'save_post_' . $this->notes_module->id, array( $this, 'sync_note_to_evernote' ), 10 );
    // updates updates 
    add_action( 'save_post_' . $this->notes_module->id, array( $this, 'sync_note_to_evernote' ), 10, 3 );
    return $post->ID;
}
Enter fullscreen mode Exit fullscreen mode

Gutenberg blocks as sync items

Evernote’s core primitive is a note. But in the case of Readwise sync, I was stuck between 2 primitives:

  • Individual highlight
  • A book/article/podcast

I decided to store the books/articles as post type (of note), but keep highlights’ individual existence as blocks of the Readwise type. Each block is tracked individually.

Each instance of the block is appended in PHP – I published how to do it in a separate tiny tutorial.

You can see how the block is implemented in this pull request:

https://github.com/artpi/PersonalOS/pull/1

Subscribe to my newsletter to get updates my Personal OS plugin and my other musings:

The post Building sync engines in WordPress appeared first on Artur Piszek.

💖 💪 🙅 🚩
artpi
Artur Piszek

Posted on June 21, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

Building sync engines in WordPress
evernote Building sync engines in WordPress

June 21, 2024