URI, URL, URN?? Identifying resources on the web

endorama

Edoardo Tenani

Posted on July 21, 2022

URI, URL, URN?? Identifying resources on the web

Cover image by Shannon Potter on Unsplash

If you do web development you will, at some point, encounter 3 particular term: URI, URL and URN (this is not so familiar but you may have encountered ARNs in AWS).
You may also have seen URI and URL being used interchangeably, but it's important to note they are not the same thing even if they are used for very similar purposes: finding things and finding things on the internet.

Let's break down what do those acronyms mean:

  • URI stands for Uniform Resource Indicator
  • URL stands for Uniform Resource Locator
  • URN stands for Uniform Resource Name

URLs and URNs are specific classifications of URIs.

It happens that URIs are very different between each other (from rfc3986#section-1.1.2):

ftp://ftp.is.co.za/rfc/rfc1808.txt
http://www.ietf.org/rfc/rfc2396.txt
ldap://[2001:db8::7]/c=GB?objectClass?one
mailto:John.Doe@example.com
news:comp.infosystems.www.servers.unix
tel:+1-816-555-1212
telnet://192.0.2.16:80/
urn:oasis:names:specification:docbook:dtd:xml:4.1.2
Enter fullscreen mode Exit fullscreen mode

Follow me in a deep dive into URIs, URLs and URNs and some good old RFC digging. Put on your safety 🥽, grab your ⛏ and let's go!!

What's a URI

A Uniform Resource Identifier is a generic way to uniquely identify any resource.

The complete definition is in RFC 3986, where you can hunt for all the details.

It takes the form of a string with this syntax:

URI = scheme ":" ["//" authority] path ["?" query] ["#" fragment]
Enter fullscreen mode Exit fullscreen mode

There are 5 components:

  • scheme (required), arbitrary but there are popular ones like mailto, https, ftp or arn
  • authority (optional), for user information and top level namespace (usually a domain or IP address) with the syntax
    authority = [ userinfo "@" ] host [ ":" port ]

    • userinfo (optional); either a username or a username and password concatenated by : (username:password)
    • host (required), can be a domain or an IPv4/IPV6
    • port (optional)
  • path (required but can be empty - you know, parsers 🤷), a hierarchical structure separated by /

  • query (optional), it starts with ? and can contain ? and / (like path)

  • fragment (optional), it starts with # until the end of the URI

This is quite convoluted and the RFC is incredibly detailed. The Wikipedia page for URI helps!

It's important to understand the URI as it's the foundation on which URL and URN are based.

What's a URL

The Uniform Resource Locator is a string representation to for a resource available via the Internet.

It has it's own RFC, RFC 1738, where we find all the familiar names and strings we see as Web developers.

It defines some specific schemas we all know and love:

   ftp                     File Transfer protocol
   http                    Hypertext Transfer Protocol
   gopher                  The Gopher protocol
   mailto                  Electronic mail address
   news                    USENET news
   nntp                    USENET news using NNTP access
   telnet                  Reference to interactive sessions
   wais                    Wide Area Information Servers
   file                    Host-specific file names
   prospero                Prospero Directory Service
Enter fullscreen mode Exit fullscreen mode

(wait, what is wais??? Think I'm too young for that!)

and it defines the usual "Internet" scheme syntax for all URLs schemes that involve usage of an IP-based protocol:

//<user>:<password>@<host>:<port>/<url-path>
Enter fullscreen mode Exit fullscreen mode

I'm young enough that I basically only used ftp, http and mailto! (And well.. watching Star Wars over telnet 😆) Did you use some of the others? Let me know in the comment, I want to read your story!!

What's a URN

Quite simply, it's a URI with the urn scheme. URNs are location indipendent and persistent identifiers.

This means there is only 1 unique URN for a given resource in a given namespace forever (or until that resource doesn't exist any more).

URNs are defined by RFC 8141.

Their properties of being location indipendent and persistent makes them useful for some very interesting use cases, especially.

Their syntax definition (rfc8141#section-2) is quite more complex, here a simplified version:

URN = "urn" ":" NID ":" NSS [ "?+" r-component ] [ "?=" q-component ] [ "#" f-component ]
Enter fullscreen mode Exit fullscreen mode

This is more easily akin to a URI with multiple components:

  • urn is the scheme
  • NID (required), the namespace identifier
  • NSS (required), the namespace specific string
  • r-component (optional), query parameters to pass to URL resolution services, note that it's used is discouraged: "Thus, r-components SHOULD NOT be used for URNs before their semantics have been standardized."
  • q-component (optional), query parameters for the named resource or the service supplying the named resource
  • f-component (optional), a fragment representing the location or region for the named resource, ignored during URN equivalence operations.

It should be noted that a public registry for URNs namespaces exists, and is maintained at IANA.

Does it mean you need to register a namespace before using it? No if you plan to use it internally, yes if you want it to be internet-global (like xmpp or uuid

So cool but where to use them?

AWS

If you have experience with Amazon Web Services you will have encountered ARNs: Amazon Resource Names. By their definition:

Amazon Resource Names (ARNs) uniquely identify AWS resources. We require an ARN when you need to specify a resource unambiguously across all of AWS, such as in IAM policies, Amazon Relational Database Service (Amazon RDS) tags, and API calls.

Sounds familiar? The format too is very URN-like (there are different formats, look at the docs!):

arn:partition:service:region:account-id:resource-type/resource-id
Enter fullscreen mode Exit fullscreen mode

From the look of it it does not seem to be a RFC-compliant URN, but it's extremely similar.

GCP

Google Cloud Platform relies on URIs to identify resources on the platform.

(Resources names](https://cloud.google.com/apis/design/resource_names) are schema-less URIs similar to:

logging.googleapis.com/projects/myproject123/locations/global/buckets/my-bucket
Enter fullscreen mode Exit fullscreen mode

logging.googleapis.com is the authority, the path the resource. Being the path hierarchical is possible to represent GCP resource structure this way (project -> collection -> resource).

LinkedIn

Another at-scale example is LinkedIn:

URNs are used to represent foreign associations to an entity (persons, organizations, and so on) in an API. A URN is a string-based identifier with the format:
urn:{namespace}:{entityType}:{id}

Express foreign keys

Simple relational database design generally rely on (autoincrementing) int for rows IDs in tables. This system is effective and works in a single database scenario.

When scaling to multiple DBs or distributed applications (es microservices) using integers is not enough anymore. Some common problems are:

  • conflicting autoincrementing numbers: being auto incremental they are exposed to possible race conditions when creating records
  • too generic: the system (or it's operators) is not able to know only by looking at the ID what kind of resource that ID refers to. If you think is not that important, Atlassian recently blew up 883 customer's websites due to a similar confusion: a script included IDs for websites and not apps in the Atlassian backend ecosystem. Those IDs were then used for deletion, but the thing deleted wasn't, as expected, the customer app instance but their entire website.

Do you have any other examples of URNs being used in systems? I'm curious to know about them so please let me know in the comments!

💖 💪 🙅 🚩
endorama
Edoardo Tenani

Posted on July 21, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related