Anatomy of a LionWeb Model

This article walks us through a LionWeb model: What it is, what’s inside, how the inside looks, what the contents mean, and why they are that way. It is meant as an introduction to some of the details of LionWeb. Our target audience are developers that want to work with LionWeb. We don’t assume any prior knowledge of LionWeb; some experience with object-oriented languages, and general modeling ideas, is helpful.

1. Introduction

The fundamental world view in LionWeb is "everything is a model". This means we express everything we talk about in the form of models.

But what is a model? STOP — don’t run away! This question is infamous in the modeling community — but we have to answer it given our world view as stated above. We deliberately don’t use that term in our specifications, as it’s so overloaded and ambiguous. For our purposes, we can always replace "model" with "a bunch of nodes". Then the question transforms to: What is a node?

2. Nodes

A node is the fundamental building block in LionWeb. It consists of a unique identifier (id), a classifier to define the nature / type / class of the node, the node’s parent node and lists of properties, containments, references, and annotation instances, respectively. We group a node’s properties, containments, and references as the features of that node.

Properties are simple values like name: String, age: Integer or alive: Boolean.

Properties examples

Containments are other nodes that this node is composed of; examples would be mainEntrance: Door[1] or corners: Coordinate[0..*].

Containments examples

References are other nodes that this node relates to; examples would be surface: Material[1] or stakeholders: Employee[1..*].

References examples

Annotation instances are other nodes that represent orthogonal concerns of this node; examples include ParsedFromLine or ReviewComment.

Annotations examples

Let’s look into each of these in more detail.

2.1. Node id

The id of a node is a string of arbitrary length, consisting of characters A-Z, a-z, 0-9, dash - and underscore _. It cannot be empty, and it must be unique for all nodes in the model.

The valid character set coincides with the output of base64url-encoding. This comes very handy if we had to derive the node id from some other source, like a binary hash, or a fully qualified name.

We chose this character set, as it should be compatible with any run-time environment, and avoid any encoding issues. We don’t limit the id’s length, so any kind of other source can be represented — fully qualified names tend to get very long, for example.

We assume nodes with equal id string to be identical.

Node ids should NOT carry any meaning: Only because a node id happens to contain the string frontLeftWheel, we should NOT assume its classifier might be Wheel, or its containment might be frontLeft. These "descriptive" node ids are quite useful to read examples or in tests, as they provide more context. However, examples and tests are for human consumption — real nodes (in their raw form) are not.

2.2. Meta-pointer

Before we get to the next part of a node, classifier, we need to discuss meta-pointers. Any time we refer from an instance model (M1) to its metamodel (M2 a.k.a. language), we use meta-pointers. They are a triple of {language key, language version, element key}, for example {language=MyLanguage, version=23-alpha, key=MyConcept}. Both language key and element key use the same character set as node ids, whereas language version can be any non-empty string.

We need this triple, because a language can be uniquely identified by its key and version, and the element key must be unique within its language. With language version we support language evolution, i.e. the fact that real-world languages tend to change over time.

We don’t prescribe anything about language version, as different implementations might use different ways to represent language evolution. We just assume that equal strings refer to the identical version.

2.3. Node classifier

A node’s classifier is a meta-pointer that defines the node’s nature, or type or class — "what kind of thing is this node?" A classifier of a node is equivalent to the class of a Java, C#, or TypeScript object. The classifier can either be a concrete concept or an annotation. It cannot be an abstract concept or interface, as we cannot instantiate them.

A node must conform to its classifier. This means the node can only have features as described in the node’s classifier. ^[1]

2.4. Node parent

The parent of node A is the node B that contains A. For example, node 10 of classifier Car might contain node 5 of classifier PetrolEngine, and node 11 of classifier SteeringWheel. Thus, the parent of 5 is 10, and the parent of 11 is also 10.

Parent example

Every node MUST have exactly one parent — except for root nodes. This implies that, starting from a root node, we can order all nodes in tree shape.

The opposite of a node’s parent are a node’s children. In the example above, node 10 has children 5 and 11; node 5 does not have any children.

A node’s classifier defines which kind of children a node can have — the node’s containments. In our example, node 5 might be contained in engine, and node 11 in steering — both defined in classifier Car.

Containment example

Additionally, a node can have annotation instances. Let’s add a node 22 of annotation DiagramPosition to our 10:Car, and node 33 of annotation ReviewComment to 11:SteeringWheel. Thus, the parent of 22 is 10, and the parent of 33 is 11. We also call this the annotated node: node 10 has annotation 22, and node 22 has annotated node 10.

Annotation instances example

In summary, a node’s parent must always have a counterpart — either in the parent’s children, or in the parent’s annotation instances.

If we deleted the parent node, we implicitly deleted all its children and annotation instances, each of their children and annotation instances, and so on. In other words, a child or annotation instance shares the lifetime of its parent.

This also explains why we consider the annotated node the parent of an annotation instance: If the annotated node is gone, the annotation instance makes no sense anymore — they share the annotated node’s lifetime.

Note

Parent vs. Supertype

We might confuse the parent of a node with the generalization of a classifier (a.k.a. supertype, superclass, extended class, implemented interface). They are completely independent of each other.

Parent always refers to a node, i.e. instance / M1 level: node A is the parent of node B.

Generalization always refers to a classifier, i.e. language / M2 level: concept List is the generalization of concept LinkedList.

We might have node 1:List that contains node 2:List: 1 is parent of 2, but both are Lists.
We might have node 3:LinkedList that contains node 4:List: 3 is parent of 4, although 4's classifier is a generalization of 3's classifier.
We might have node 5:List and another, unrelated, node 6:LinkedList. 5 and 6 have no parent-relationship, but 5's classifier is a generalization of 6's classifier.
We might have node 7:List that contains node 8:Person: 7 is parent of 8, but 7's classifier is completely unrelated to 8's classifier.

2.5. Properties

We use properties to store simple values inside a node. A node of classifier Person might have properties name: String, age: Integer and alive: Boolean.

Person properties example

A node’s classifier describes which properties can appear in the node. This description includes the property’s name, key, whether it is optional, and its type. On low-level serialization, we use a meta-pointer to refer to the description for each property value.

The property’s name is for humans to understand the property; we use the key to uniquely identify the property technically. With the optional flag we specify whether a node without this property is considered valid or not: In our system, a person without known name might be a problem, but we could still process them without knowing their age.

A property’s type describes what kind of values can appear in that property. All LionWeb data types, and thus all properties, are value types^[2]. This means they don’t have any identity, we just know their value: We can distinguish 10 and 15, but 12 and 12 are the same.

Some modeling frameworks allow properties with multiple values, e.g. luckyNumbers: Integer[0..*] = {23, 42, 4711}. LionWeb does not support that, because the values don’t have identity. So if we first had a property value of {23, 42}, and then {23, 42, 42}, we have no way of telling whether the new 42 has been inserted or appended. This leads to all kinds of problems we’d like to avoid.

The following subsections describe the kinds of data types supported by LionWeb.

2.5.1. Primitive types

LionWeb ships with built-in types for String, Integer, and Boolean, as they are needed in almost all systems. String properties can have any string value, including empty. Integer properties support positive and negative decimal integers of any length. Boolean properties can only have the values true or false. A language might define its own additional primitive types. In any case, we store all of them as string in the low-level serialization format (i.e. JSON).

2.5.2. Enumerations

A property can also be of an enumeration type. Enumerations have a name, key, and a list of enumeration literals. Again, name is for human understanding, and key for unambiguous technical reference. Each literal also has a name and a key, with the same purpose. A language must explicitly declare an enumeration to use it as the type of a property. Enumeration properties must have one of the enumeration literal values. On low-level serialization, we store the value as the enumeration literal’s key.

Enumeration example

2.5.3. Structured data types

LionWeb supports another kind of simple type: structured data type (added in LionWeb version 2024.1). They are a collection of other primitive types, called fields. Both the structured data type and each field has a name and key with the usual meaning.

Structured data type example

We store the value of a structured data type as string, the same as all other primitive types. The content of that string is a serialized and escaped JSON object. As example, the value of above’s TextPosition would be "{\"lin\": \"42\", \"col\": \"-30\"}". Refer to serialization specs for details.

2.6. Containments

A node A can contain other nodes B and C. Then B has the parent A; A has children B and C. If C contains D, then D has the ancestors C and A; A has the descendants B, C, and D.

Parent, children, ancestors, and descendants example

Every node (except root nodes) must have exactly one parent, thus no node can be contained more than once. Containments establish a tree shape of the model (sometimes called primary containment tree or dominator tree). A node can never contain itself, neither directly nor indirectly.

We use containments if the contained node only makes sense together with its parent — a Coordinate is not very useful if we don’t know which shape it belongs to. We also identify different containments — for a Rectangle, we want to know whether a contained Coordinate is topLeft or lowerRight.

Several containments example

A node’s classifier describes the valid containments. This description includes the containment’s name, key, whether it is optional, whether we allow multiple nodes in the containment, and its type. On low-level serialization, we use a meta-pointer to refer to the description for each containment.

Name, key and optional flag mean the same for containments as for properties. Contrary to properties, a containment can be singular or multiple. A singular containment (i.e. multiple = false) allows only one contained node, e.g. a Rectangle can contain only one topLeft coordinate. In contrast, a House can contain multiple Rooms. For multiple containments, we keep track which child is contained at which position. As every contained node can have only one parent, we cannot have duplicates in multiple containments. Thus, multiple containments behave like an ordered set.

Single (address) and multiple (rooms) containments example

A containment’s type refers to a classifier, and describes which nodes can be part of that containment. Only nodes that have a compatible classifier can be part of that containment.

Note

All of interfaces, concepts, and annotations are classifiers. Thus, a containment can specify its type to be IMoveable interface, AbstractShape (abstract) concept, Rectangle (concrete) concept, or Todo annotation.

The latter is an edge case, but has its usages: We might want to annotate other nodes with todos specific to that node, but also have a list of general todos somewhere, unrelated to a specific node. Moreover, we definitely want to allow both interface and concept as containment type; keeping annotation out of that would clutter LionWeb’s meta-metamodel.

On serialization level, we store the contained node’s id for a containment. A containment is invalid if the contained node’s id cannot be resolved.

2.7. References

A node can point to other nodes it has some relation to. For example, a Project might refer to its mainResponsible: Employee, or to all its externalParties: BusinessPartner[0..*]. The node containing the reference is called source, the referred node is called target of the reference. A node can be referenced by none, one, or many other nodes. It can be referenced by itself. A node can reference another node (or itself) more than once. Thus, multiple references behave like a list.

Interstellar projects example

A node’s classifier describes the valid references. This description includes the references’ name, key, whether it is optional, whether we allow multiple targets in the reference, and its type. On low-level serialization, we use a meta-pointer to refer to the description for each reference.

The description means the same as for containments, respectively. For multiple references, we keep track which target is listed at which position. The same target can appear several times in a multiple reference. This can help, for example, to describe steps: steps: Task[1..*] = {generate, saveAll, compile, saveAll} refers to Task saveAll twice.

LionWeb does not support bidirectional references. They are very hard to maintain, and also hard to process (i.e. if we want to traverse the whole model).

2.7.1. ResolveInfo

For references, we store the target node’s id, and additionally a resolveInfo. ResolveInfo is a string that somehow describes the target node. Typical candidate would be the target’s node name (if the target node had such a property). To support this typical default behavior, LionWeb has a built-in interface INamed with one property name: String that might be implemented by any classifier. However, it’s up to the application to select and set the appropriate resolveInfo.

This serves two purposes:

In the interstellar project example above, assume some app that displays the list of external parties for the project as ESA: 23, JAXA: 42. Now someone thinks there’s no project left with ESA, and deletes node 87 (the ESA one). If the externalParties reference only knew the target node id, the best possible display would be !unknown target 87!, JAXA: 42 — not very helpful to the user. With resolveInfo, we can display !unknown target ESA!, JAXA: 42. Now the user has a much better chance of tracking down the error.
We need to know the target node id in order to store it in a reference. But in parser-based systems, after parsing we know only some kind of identifier of a reference target. We might want to store this unresolved abstract syntax tree (AST) as a LionWeb model. A later linking step hopefully finds the actual target and fills in the proper target node id.

Target node id, resolveInfo, or both must be set for each reference — otherwise the reference is pointless.

2.7.2. Unresolved references

Assume a node T that happens to be the target of a reference myRef from node A. When we delete T, the reference stays untouched: A still contains a reference myRef with target node id T (and maybe some resolveInfo), but the target cannot be resolved. LionWeb allows references with unresolvable target. It’s up to the consumer of the model to do something about that target, though.

Unresolved references example

2.8. Differences between containment and reference

A contained node shares the lifetime of its parent — if the parent is deleted, the contained node gets deleted, too. The contained node makes no sense without its parent. Even if we kept the contained node, where would it go? We’d need to invent some special place for it. In contrast, the target of a reference is perfectly valid without the source. They can be deleted independently, and don’t need to be deleted when sources pointing to them are deleted.

Containments establish a tree shape for the model. With references, we regard the model as a graph, as a node can be target of several references.

Node graph example

For containments, the parent node only stores the child node’s id. For references, the source node stores the target node’s id and/or resolveInfo.

LionWeb does not support unresolvable containments, but unresolvable references are allowed.

2.9. Annotation instances

Annotation instances are mostly regular nodes, with two exceptions: Their classifier must be an annotation, it cannot be a concept. Also, the annotation instance must be mentioned in its parent’s annotation instance list, it cannot be mentioned in the parent’s containments. Both these criteria are indirect; thus, when only looking at a node (without examining its classifier or parent), we cannot distinguish annotation instances from other nodes. That’s by design: everything is a node, remember?

Annotations examples

An annotation instance can have any kind and amount of properties, containments, references, and other annotation instances. We can use annotation instances as reference target.

Assuming node X mentions node Y in X's annotation instances, we say X is annotated by Y, and Y has annotated node Y. Y's annotated node is always also Y's parent node.

Relations and terms between a node and its annotation instance

Annotation instances share the lifetime of their annotated node; when we delete the annotated node, the annotation instance gets also deleted.

We keep track of the order of annotation instances. As every annotation instance can have only one parent, we cannot have duplicates in annotations. Thus, annotation instances behave like an ordered set.

2.9.1. Applicable classifiers

The classifier of an annotation instance must be an annotation. Annotations are similar to concepts, and both are classifiers. But an annotation specifies which other classifier it annotates. The instance of that annotation must adhere to this constraint.

As example, assume we have classifiers Function and Argument. We also have an annotation SourcePosition that annotates = Function. On M1, we have node 1 of Function, and node 3 of Argument. With node 34, we have an annotation instance of SourcePosition. Node 1:Function can be annotated by node 34:SourcePosition, because SourcePosition states it annotates Function. In contrast, node 3:Argument cannot be annotated by node 35:SourcePosition, because Argument is not compatible with Function (the kind of classifier SourcePosition annotates).

Valid (34) and invalid (35) annotation instances example

An annotation that is applicable to all concepts says it annotates Node from LionWeb’s built-in library. Annotations can annotate other annotations; however, this approach introduces complexities and should be used with great caution.

2.10. Differences between containment and annotation instances

Both a contained node C and an annotation instance A share the lifetime of their parent node P. So what’s the difference?

A containment must be described in the parent’s node classifier. If the parent is of classifier House with containments rooms, and doors, we (as users of that classifier) cannot add a containment windows to the parent node. The designer of classifier House made the choice that for their domain, a house should only care about rooms and doors, but not windows. The designer might not have anticipated our use case of their domain. But every user of that classifier relies on this structure of a house, so we must not mess with it.

An annotation instance cannot be mentioned in the parent’s node classifier at all. Instead, an annotation instance’s classifier must be an annotation, which mentions the classifier it annotates. Thus, the designer of the annotation decides which targets their annotation is applicable to. So we can come up with an annotation WindowAnnotation that annotates: House, and attach instances to nodes of classifier House. All these nodes still adhere to the intents of House's designer, but also transport the additional information about windows we need for our use case.

Even more importantly, we can put orthogonal concerns into annotation instances. For the domain of our functions with arguments, we don’t want to bother with the details how a parser constructed that node. But it would be useful to show users where this node originated from in case of errors. Not a problem with an annotation SourcePosition. Similarly, we could come up with an annotation ReviewComment that annotates: Node (i.e. any node), to be used during code reviews. The reviewer can attach instances to any node, without interfering with the original intent of these nodes.

Review comments demonstrate one drawback of annotations: we’re violating the generally valued idea of separating concerns, as we mix domain concerns and process concerns in the same model. The alternative would be to keep both concerns in separate models, and have the review comments reference the node they apply to. Experience shows this approach is very hard to maintain: What happens if we deleted a function, and there’s still a review comment referring to it? The review model might not be loaded during the deletion, so it now contains a "zombie" review comment. We might need to introduce "garbage collection" strategies to find such zombies and deal with them. For more context-sensitive annotations, moving the annotated node might introduce similar problems: The annotation instance might not make sense in the context of the annotated node’s new position. As example, imagine an annotation Profiling { cpuPercentage: Integer; annotates = Statement }. This annotation is not only meaningless, but actually wrong if the statement were moved somewhere else.

Annotations should not be used as replacement for proper concept design. If our language dealt with the domain of houses, review comments are completely independent of that domain, should not be part of it, and are perfect candidates for annotations. However, if our language was modeling discussions, then reviews and comments are integral part of that domain, and should be represented by proper containments and (abstract) concepts in our language. In this case, using annotations only because they seem conveniently applicable at lots of places is a strong modeling smell.

3. Built-in Library

Every LionWeb implementation must ship with a built-in library. This library contains our pre-defined primitive types, Node as the generalization of all concepts, and interface INamed.

Named things are so common in models that it warrants a special interface. Implementations can offer default behavior for nodes of INamed type, e.g. use the name in tree displays, for issue reporting, or as default for resolveInfo in reference targets.

Built-in library

4. Languages

Programming languages like Java or C# natively know things like classes, records, and enums. We can use them to describe the structures we care about in our program, e.g. class Person. Then, we work with instances of these structures, e.g. new Person("Joe").

LionWeb has the same three conceptual layers: LionWeb natively knows e.g. concepts, data types, and enumerations; we call this "meta-metamodel" or M3. With them, we can describe e.g. concept Person; this would be part of a "Language" or M2. Then, we work with instances, aka nodes, e.g. node<23:Person> { name: String = "Joe" } as part of M1.

This section describes our meta-metamodel. We’ve already explained some aspects in the nodes section; we won’t duplicate that here.

Our meta-metamodel closely resembles other meta-metamodels like EMF Ecore, MPS structure aspect, Java, C#, TypeScript, etc. This is a very deliberate choice: We want to be as compatible as possible. Also, several decades of usage shows they just work — we don’t want to reinvent the wheel, or be different because.

LionWeb meta-metamodel. Green boxes relate to classifiers; blue boxes to features, and red boxes to data types.

A language groups related classifiers and data types (summarized as language entities). For example, we might design a language to describe houses, or cars. We’ve already covered a language’s version in the meta-pointer section. It also explains keys, used to uniquely identify everything inside a language.

4.1. Classifiers

Classifiers define the nature / type / class of a node. LionWeb knows three kinds of classifiers: concepts, annotations, and interfaces. Every classifier can define an arbitrary number of features.

All classifiers support inheritance, meaning that a classifier "inherits" all features defined by its direct or indirect generalizations. A generalization, (a.k.a. supertype, superclass, extended class, implemented interface) is the opposite of specialization (a.k.a. subtype, subclass, interface implementations, "is a"). A classifier A is compatible to a classifier B if A is a (direct or indirect) specialization of B or, looking at the same fact from the other direction: B is a (direct or indirect) generalization of A. Every classifier implicitly or explicitly specializes Node from LionWeb’s built-in library.

Concepts are equivalent to classes in object-orientation. They can extend one other concept, and implement any number of interfaces. An abstract concept can never be instantiated, i.e. there cannot be a node with that classifier. An abstract concept can extend a concrete (i.e. abstract=false) concept, and vice versa. Concepts marked as partition can only appear as root nodes in a model (i.e. not have a parent).

Annotations are similar to concepts: They can extend one other annotation and implement interfaces. They are always concrete, and cannot be used as partition. Additionally, annotations describe which classifiers they are applicable to.

Interfaces can extend any number of other interfaces. They are always abstract, thus can never be instantiated.

Note

Single vs. multiple inheritance

Concepts and annotations support single inheritance, interfaces multiple inheritance. This is in line with most modern programming languages. However, programming languages do this to avoid the diamond inheritance problem. We don’t have this problem in LionWeb, as we don’t have behavior.

This begs the question: What’s the difference between an abstract concept and an interface? The answer: Not much. So we could just abandon interfaces completely, allow multiple inheritance for concepts, and be done, right? Yes, that would work.

Issue 104 lists all the gory details of our discussion on this topic. The gist: We couldn’t find strong arguments in either direction. It mostly comes down to compatibility with other systems, which in turn boils down to who does the work: if we allowed multiple inheritance, converters to similar systems like EMF or C++ would be simple, but converters to single inheritance systems like MPS, Java or C# would need to spend lots of effort on mapping. The same applies with disallowing multiple inheritance: MPS, Java or C# converters are simple, but EMF or C++ become trickier. We took the latter direction more or less arbitrarily so we can move on.

4.2. Features

Features are the possible contents of a classifier. LionWeb supports three kinds of features: properties are simple values, containments describe other nodes with shared lifetime, and references point to other nodes. We summarize the latter two as links.

4.3. Data types

Data types describe the possible values of a property. LionWeb supports three kinds of data types: primitive types with no further specified structure, enumerations with a finite, pre-defined, non-extensible list of possible values, and structured data types as combination of other data types.

Appendix A: Notation used in this article and diagrams

We use integer numbers as node ids, and short names for keys of language elements. We label boxes in diagrams that represent M1-level nodes with the node id and its classifier, as in 10:Person. Boxes in diagrams for M2-level language elements are labelled only with the classifier, i.e. Person. Boxes that represent concepts or concept instances use C icon, and @ icon for annotation (instance) boxes.

Appendix B: Language key

We might ask, "What’s the difference between a language id and its key? Everything is a node, thus a language is a node, so it must have a unique id — that should be enough?!"

A language is indeed a node. However, languages tend to be interchanged between different systems. If we exported the language from system A, and imported it in system B, the node id assigned by A might already be taken in B — so we need to assign a new one. Later, we might transfer instances of that language from A to B. The instances would use the language’s original id in their meta-pointer. So we had to store mapping tables from the language import, and apply them every time we import an instance — not very appealing.

Another problem arose if we had a language in several versions. Different versions of the same language are represented by different nodes — otherwise they would not be different. Then the same language in different versions would have different ids. Let’s assume the language contains three concepts, and only one of them changed from version 1 to 2. We still must have new (albeit semantically identical) nodes for the other two concepts in version 2, as a concept cannot be part of two versions of a language. Thus, all three concepts of version 2 must have different node ids from the corresponding concepts in version 1. Then, if we migrated an instance of that language from version 1 to 2, we had to change all meta-pointers — again, not very appealing.

Having a key for both languages and all their elements solves all of these problems: The key stays the same in different systems, and through different versions.

1. At least by default; we might deviate from that constraint under special circumstances, like migrations.

2. https://en.wikipedia.org/w/index.php?title=Value_type