Linked Open Code

I've been working off and on over the last six months on a programming language that sits on top of linked open data. Think of it as linked open code.

von Neumann Architecture

Before von Neumann made his observations about code and data, computers typically had some memory dedicated to code, and other memory dedicated to data. The processing unit might have a bus for each, so code and data didn't have to compete for processor attention.

This was great if you were able to dedicate your machine to particular types of problems and knew how much data or code you would typically need.

Von Neumann questioned this assumption. Why should memory treat code and data as different things when they're all just sets of bits?

Today, our machines are built based on the "von Neumann architecture." Code and data share memory and the pipeline to the processor. If we have a program that needs more data, we just need to squeeze down the code. If we need more memory for code, we stream or page the data. We don't need to know when we buy a machine how much code or data we'll be using. We can partition the memory of our machines based on then needs of the program at hand.

The result is a slightly slower machine with the single data bus acting as a bottleneck between memory and the processor (the "von Neumann bottleneck"), but the tradeoff is worth it because we have more generally useful computers.

Von Neumann allowed us to mass produce computers.

Along the way, we've encountered some problems. If programs aren't careful, data can be interpreted as code, resulting in buffer overflow attacks, viruses, and other problems that wouldn't come up if code weren't in the same memory as data.

Modern CPUs solve this by marking memory as having only code or having only data, though not all operating systems make use of this. This feels like a step back to pre-von Neumann, but remember that the CPU can decide from moment to moment which blocks are code and which are data. We can still move the line between the two, but with some additional security.

Linked von Neumann

What's the state of things on the web or in the world of linked data?

Groups like the Corporation for National Research Initiatives are trying to make distributed programs, but are making the same wrong assumptions that people made before von Neumann: that code and data are fundamentally different.

They aren't. Or at least, they don't have to be.

Let's consider what a computer deals with at a fundamental level:

  • Fetching data from memory (dereferencing a memory address)
  • Sending data to memory
  • Delete data by overwriting memory
  • Executing instructions previously fetched from memory

Computers don't tend to create or destroy memory.

Let's compare the above list to REST operations:

  • GET resource
  • PUT resource
  • DELETE resource
  • ???

There's no REST equivalent of executing instructions in part because that's an interpretive behavior on the part of the processor.

There's no processor equivalent of POSTing (creating) a resource because processors don't cerate new memory.

So if we add an ability to execute a resource, we have a superset of what a hardware processor can do.

Just as with a processor, the resource to be executed has to make sense as a set of instructions. Executing code like this also opens up new attack vectors.

Example Linked Code

What would such an executable resource look like?

Based on some work I've been doing this year, the following could be a reasonable JSON-LD representation of a resource that sums up a list of numbers.

Don't be put off by how much stuff there is. The resource is entirely self-contained. It doesn't reference anything outside of itself. Of course, you'll need the right processor, but that's true of any code.

The source for this is something like the following (shoving a stream through a function):

The only thing missing from the JSON-LD is the assignment of the lambda to the symbol sum, but that's okay. Instead, we can store it on the web somewhere and reference it.

Here's how that works.

In a functional language, the name of a function and the definition of a function are two separate concepts. In fact, functions don't really have names any more than numbers do. Instead, they have places where they reside.

For example, if we say that x = 123, we don't consider x to be the name of the number 123. Instead, we think of x as a bucket into which we place the value 123.

Similarly, the function defined by sum(s) :> ... is really just saying that we want to take the function we create and store it in the bucket labeled sum.

When we want to use the function, we point to the bucket and say, "Use the function in that bucket." So sum(1..10) will apply the function in the sum bucket to the list of numbers 1, 2, 3, ..., 9, 10 and give us 55.

The magic here is that sum could be a URL pointing to a resource that defines how to sum up a list of numbers. The processor would see that the bucket was on-line somewhere else rather than in memory, fetch the function definition, and then apply that definition to the list of numbers.

All without you having to know that that's what was going on.

In fact, the list of numbers could be encoded as an RDF list elsewhere on the web. The function and the data don't have to reside on the same system as the processor.

The only difference between code and data is what we impose through our interpretation.