IPFS CIDS v.s. CWIDS

rleddy

Richard Leddy

Posted on August 3, 2021

IPFS CIDS v.s. CWIDS

So, I just wrote a program, call it more a prototype, for something like email, but using IPFS. I call it interplanetary-contact for the moment. You can find a copy of it at copious-world. I was hoping for email without a server, but for this you still have to invoke the node.js server included there. And, interface is a Svelte application.

I got it up and working. But, I still have much more to do with it. One things I was doing was using the Mutable File System, MFS which is part of IPFS. And, I was allowing the server to act as an agent for IPFS. Not the worst thing you can do, but not a true player in P2P. Also, the server is being relied upon to make CIDs or fetch them from IPFS.

So, I need to make CIDs in a browser. This led me to the anatomy of CID. And, as I dug deeper, I wanted more from a CID and less code in the browser, too. So, I came up with a CWID. It's really a minor difference. And, I will explain. But, I will also fill you in on a CID just in case you are not the CID aficionado.

CIDs

A CID is a content identifier. It refers to a unique set of bytes, that is one byte array per CID. It also has information, a prefix, preceding the true identifier telling the programs and programmers the format in which the identifier is being presented. So, the if identifier contains a SHA256 hash of the content, the hash may be presented in a base, such as Hex = base16, or base64url, or btc58, or binary, or other. Another hashing algorithm may be used, and the length will be part of the encoded data. At the very front of the string is single character indicating the base. All the rest of the characters are in the base.

So, for a CID not all formats can simply be tacked together by concatenating parts of the CID. Hex can be done that way. But, base64 requires a little more finesse. Before putting the base flag at the front of the string, all the descriptors must be placed into a binary buffer, and the whole buffer gets presented in the base.

So, for a Hex string using SHA256, prior to putting an 'f' in front of the lower case hex string, the buffer will be arranged as follows:

  • version number: 01 - this is the CIDv1
  • the hash storage structure: 'raw' = 0x55
  • the algorithm: 'sha2-256' = 0x12
  • length of the hash buffer: 0x20 = 256bits/8 = 32bytes

The final string will look something like the following:

f01551220a9uf9ac9af9ua0c8a9f890a8f0af0...

So, if you have a system that is solid in one kind of CID, you can store f01551220 as a constant. But, that works if you are using Hex or another very predictable base.

If you get good CIDs, you can try them out using the IPFS CID Inspector.

My CID problem

So, I was thinking why use all that IPFS code to make a CID, and what if I want to use base64? So, I started toying around with CIDs and I came to a point where the idea of concatenating strings and have the SHA256 hash clearly exposed would be nice. But, CIDs are not so amazingly clear with base64.

So, here is a SHA256 hash of "MUCH TO DO ABOUT IDENTITY" in base64url:

KSrJ8BfYBJFRaHsFbClKvhwVFeapgccN2QCxShd54GY=

Here is the IPFS CID for just this text:

uAVUSICkqyfAX2ASRUWh7BWwpSr4cFRXmqYHHDdkAsUoXeeBm

So, where did the SHA hash go?

Well, it is in there, but the length of the SHA256 hash, 32 bytes, which you might have seen before in the hex prefix, has started the base64 encoding, changing the rest of the string.

In fact, the base64 encoding of the prefix, which might never change, is uAVUSIA==. You can see the uAVUSI part. But, the size of hash is not going to change either.

So, what if I want to get get the hash from the CID. There are times one wants to do that. For instance, you might be verifying a hash in a Merkel tree. So, this would mean that for all hashes, you have to run the CID parser and pull out the bytes.

But, you could just make a string that keeps the hash exposed. You want the hash, and neither its function nor its accessibility doesn't change from how you pack the string. But, that is a functional point about the hash doing its job as an indicator. You can make the string harder to access in the sense that the computer works harder to pull it out. But, that's all that will be happening. So, with millions and millions of hashes, the energy bill goes up.

CWID

Does CWID rhyme with quid?

Well, I put a program in this repository: CWID

A CWID is not much different than a CID. I just made it so that the prefix is separate from the hash. I put in a Sheffer stroke '|' to indicate the separation.

So, here is my CWID for "MUCH TO DO ABOUT IDENTITY":

uAVUSIA==|KSrJ8BfYBJFRaHsFbClKvhwVFeapgccN2QCxShd54GY=

Maybe the padding could be taken out and put back in for checking it out. Maybe something other than a Sheffer stroke should be used (it gets URI encoded by some browsers). Or, maybe that is not too big a issue.

So, if you are making lots of CWIDs, just make a constant out of the prefix and have it waiting in a buffer to write the hash after it. That works well in C++. The JavaScript might not gain too much from that.

I was wondering if the prefix really had to be in the same base as the hash. I guess if you are on a really small machine that you might only have room for one encoding function. Would that just be a special prefix?

Well, now that I have this trick, I can revisit Interplanetary-contact and switch to making CWIDs in the browser. Maybe I will have to rename it to Intergalactic-contact.

You can play with the code.

npm test

This should launch your default browser and start up http-server to serve the page. You may have to install http-server.

npm i http-server@0.8.5

The CWID repository is not an npm package. But, you can clone it off of GitHub. Please feel free to raise issues.

Also, note that this is very new (did it today), and there is very little support for all the different formats. That is, there is just base64url and Hex.

Following up

Hope that is entertaining for now. For later, I am planning to discuss identity and how to have email without having email. With the contact system, no email addresses are used. Rather, your description of yourself is turned into a content identifier. But, that's all for now...

💖 💪 🙅 🚩
rleddy
Richard Leddy

Posted on August 3, 2021

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

IPFS CIDS v.s. CWIDS
cid IPFS CIDS v.s. CWIDS

August 3, 2021