A Categorical Normalization Proof for the Modal Lambda-Calculus

We investigate a simply typed modal $\lambda$-calculus, $\lambda^{\to\square}$, due to Pfenning, Wong and Davies, where we define a well-typed term with respect to a context stack that captures the possible world semantics in a syntactic way. It provides logical foundation for multi-staged meta-programming. Our main contribution in this paper is a normalization by evaluation (NbE) algorithm for $\lambda^{\to\square}$ which we prove sound and complete. The NbE algorithm is a moderate extension to the standard presheaf model of simply typed $\lambda$-calculus. However, central to the model construction and the NbE algorithm is the observation of Kripke-style substitutions on context stacks which brings together two previously separate concepts, structural modal transformations on context stacks and substitutions for individual assumptions. Moreover, Kripke-style substitutions allow us to give a formulation for contextual types, which can represent open code in a meta-programming setting. Our work lays the foundation for extending the logical foundation by Pfenning, Wong, and Davies towards building a practical, dependently typed foundation for meta-programming.


Introduction
The Curry-Howard correspondence fundamentally connects formulas and proofs to types and programs.This view not only provides logical explanations for computational phenomena, but also serves as a guiding principle in designing type theories and programming languages.
Extending the Curry-Howard correspondence to modal logic has been fraught with challenges.One of the first such calculi for the modal logic S4 were proposed by Bierman and de Paiva [8,7] and subsequently by Pfenning and Davies [32].A key characteristic of this work is to separate the assumptions that are valid in every world from the assumptions that presently hold in the current world.This leads to a dual-context style formulation that satisfies substitution properties (see for example [15]).
In recent years, modal type systems based on this dual-context style have received renewed attention and provided insights into a wide range of seemingly unconnected areas: from reasoning about universes in homotopy type theory [28,36] to mechanizing meta-theory [34,35], to reasoning about effects [38], and meta-programming [25].This line of work builds on the dual-context formulation of Pfenning and Davies.

7-2
Categorical Normalization Proof for Modal Lambda-Calculus However, due to the permutation conversions it is also challenging to extend to dependent type theories and directly prove normalization via logical relations.An alternative to dual-context-style modal calculi is pursued by Clouston, Birkedal and collaborators (see [11,21]).This line of work is inspired by the Fitch-style proof representation given by Borghuis [10].Following Borghuis they have been calling their representation the Fitch style.Fitch-style systems model Kripke semantics [27] and use locks to manage assumptions in a context.To date, existing formulations of S4 in Fitch style [11,21] mainly considers idempotency where T is isomorphic to T .However, this distinction is important from a computational view.For example, in multi-staged programming (see [32,13]) T describes code generated in one stage, while T denotes code generated in two stages.It is also fruitful to keep the distinction from a theoretical point of view, as it allows for a fine grained study of different, related modal logics.
In this paper, we take λ → , an intuitionistic version of modal logic S4, from Pfenning, Wong and Davies [33,13] as a starting point.Historically, λ → is also motivated by Kripke semantics [27] (see [12, Section 3] and [13,Section 4]) and is hence referred to as the Kripke style.Unlike Fitch-style systems where worlds are represented by segments between two adjacent "lock" symbols, in λ → , each world is represented by a context in a context stack.Nevertheless, the conversion between Kripke and Fitch styles is largely straightforward 3 .Here, we will often use "context" and "world" interchangeably.In λ → , a term t is typed in a context stack − → Γ where initially, the context stack consists of a single local context which is itself empty (i.e.ǫ; •).ǫ; Γ 1 ; . . .; Γ n ⊢ t : T or − → Γ ⊢ t : T The rightmost (or topmost) context represents the current world.In the introduction rule, we extend the context stack with a new world (i.e.new context).In the elimination rule, if T is true in a context stack − → Γ , then T is true in any worlds − → Γ ; ∆ 1 ; . . .; ∆ n reachable from − → Γ .The choice of the level n corresponds to reflexivity and transitivity of the accessibility relation between worlds in the Kripke semantics.− → Γ ; • ⊢ t : T − → Γ ⊢ box t : T − → Γ ⊢ t : T − → Γ ; ∆ 1 ; . . .; ∆ n ⊢ unbox n t : T There are two key advantages of this unbox formulation in λ → .First, it introduces a syntactic convenience to use natural numbers to describe levels and therefore allows us to elegantly capture various modal logics differing only in one parameter of the unbox rule.By introducing unbox levels, is naturally nonidempotent.Having unbox allows us to study the relation of various sublogics of S4 and treat them uniformly and compactly.
Second, compared to dual-context formulation, it directly corresponds to computational idioms quote (box) and unquote (unbox) in practice, thereby giving a logical foundation to multi-staged metaprogramming [13].In particular, allowing n = 0 gives us the power to not only generate code, but also to run and evaluate code.
A major stumbling block in reasoning about λ → (see also [17]) is the fact that it is not obvious how to define substitution properties for context stacks.This prevents us from formulating an explicit substitution calculus for λ → which may serve as an efficient implementation.More importantly, it also 3 However, we note that λ → has never been identified as or called a Fitch-style system.
Hu and Pientka 7-3 Terms t and t ′ have type T and are equivalent in context stack − → Γ β equivalence: 1. Typing judgments and some chosen equivalence judgments seems to be the bottleneck in developing normalization proofs for λ → that can be easily adapted to the various subsystems of S4.
In this paper, we make the following contributions: (i) We introduce the concept of of Kripke-style substitutions on context stacks (Sec.3) which combines two previously separate concepts: modal transformations on context stacks (such as modal weakening and fusion) and substitution properties for individual assumptions within a given context.(ii) We extend the standard presheaf model [5] for simply typed λ-calculus and obtain a normalization by evaluation (NbE) algorithm for λ → in Sec. 4. One critical feature of our development is that the algorithm and the proof accommodate all four subsystems of S4 without change.(iii) As opposed to Nanevski et al. [30], we provide a contextual type formulation in λ → inspired by our notion of Kripke-style substitutions in Sec. 5 which can serve as a construct for describing open code in a meta-programming setting.
This work opens the door to a substitution calculus and normalization of a dependently typed modal type theory.There are A partial formalization [22] of this work in Agda [2,31] and an accompanying technical report [23].

Definition of λ →
In this section, we introduce the simply typed modal λ-calculus, λ → , by Pfenning, Wong and Davies [33,13] more formally.We concentrate here on the fragment containing function types S −→ T , Following standard practice, we consider variables, applications, and unbox neutral.Functions, boxed terms and neutral terms are normal.Note that we allow reductions under binders and inside boxed terms.As a consequence, a function or a boxed term is normal, if their body is normal.
We define typing rules and type-directed equivalence between terms in Fig. 1.We only show the rules for β and η equivalence of terms, but the full set of rules can be found in Appendix A. We use Barendregt's abstract naming and α renaming to ensure that variables are unique with respect to context stacks.The variable rule asserts that one can only refer to a variable in the current world (the topmost context).In a typing judgment, we require all context stacks to be non-empty, so the topmost context must exist.
From Kripke semantics' point of view, the introduction rule for says a term of T is just a term of T in the next world.The elimination rule brings T from some previous world to the current world.This previous world is determined by the level n.As mentioned earlier, the choice of n determines which logic the system corresponds to.| − → ∆| counts the number of contexts in − → ∆.To illustrate, we recap how the axioms in Sec. 1 can be described in λ → .K is defined by choosing n = 1.Axiom T requires that n = 0 and Axiom 4 is only possible when unbox levels (ULs) can be > 1.
The term equivalence rules are largely standard.In the η rule for , we restrict unbox to level 1.In the β rule for , we rely on the modal transformation operation [13], written as {n/0}, which allows us to transform the term t which is well-typed in the context stack − → Γ ; • to the context stack − → Γ ; − → ∆.We abuse slightly notation and use ; for both extending a context stack with a context and appending two context stacks.We will discuss modal transformations more later in this section.

Term Substitutions
A term substitution simply replaces a variable x with a term s in a term t.It simply pushes the substitution inside the subterms of t and avoiding capture using renaming.Below, we simply restate the ordinary term substitution lemma:

Modal Transformations (MoTs)
In addition to the usual structural properties (weakening and contraction) of individual contexts, λ → also relies on structural properties of context stacks, e.g. in the β rule for .In particular, we need to be able We call the case where n = 0 modal fusion or just fusion.Other cases are modal weakening.These names can be made sense of from the following examples: • When n = l = 0, the lemma states that if − → Γ ; Γ 0 ; ∆ 0 ⊢ t : T , then − → Γ ; (Γ 0 , ∆ 0 ) ⊢ t{0/0} : T .Notice that Γ 0 and ∆ 0 in the premise are fused into one in the conclusion, hence "modal fusion".
• When l > 0, the leading l contexts are skipped.If n = 2, l = 1 and

Kripke-style Substitutions
Traditionally, we have viewed term substitutions and modal transformations as two separate operations [13].This makes reasoning about λ → complex.For example, a composition of n MoTs leads to up to 2 n cases in the unbox case.This becomes quickly unwieldy.How can we avoid such case analyses by unifying MoTs and term substitutions as one operation that transforms context stacks?-We will view context stacks as a category and a special unifying group of simultaneous substitutions as morphisms (denoted by ⇒).MoTs are then simply a special case of these morphisms.Lemma 2.2 suggests to view a MoT as a morphism: because {n/l} moves t from the codomain context stack to the domain context stack.If this group of substitutions are closed under composition, then a category of context stacks can be organized.

Composing MoTs
If MoTs are just special substitutions, then the composition of substitutions must also compose MoTs.
The following diagram is a composition of multiple MoTs, forming a morphism This composition contains both fusion (Γ 1 , Γ 2 , Γ 3 ) and modal weakening (∆ 0 , ∆ 1 , ∆ 2 ).The thin arrows correspond to contexts in both stacks.Some of these arrows are local identities (Γ 0 and Γ 4 ).They happen between contexts that are affected by modal weakenings.The rest are local weakenings (Γ 1 , Γ 2 and Γ 3 ); in this case, they are affected by modal fusions.The first observation is that the size of gaps between adjacent thin arrows varies, because it is determined by different MoTs.Another observation is that thin arrows do not have to be just local weakenings; if they contain general terms, then we obtain a general simultaneous substitution.Moreover, thanks to the thin arrows, we know exactly which context stack each term should be well-typed in.Combining all information, we arrive at the definition of Kripke-style substitutions: where a local substitution σ : − → Γ ⇒ Γ is defined as a list of well-typed terms in − → Γ for all bindings in Γ.
Just as context stacks must be non-empty and consist of at least one context, a K-substitution must have a topmost local substitution written as ε; σ in the base case.It provides a mapping for the context stack ǫ; Γ.We extend a K-substitution − → σ with ⇑ n σ where n captures the offset due to a MoT and σ is the local substitution.To illustrate, the morphism in the previous diagram can be represented as ε; id; ⇑ 3 wk 1 ; ⇑ 0 wk 2 ; ⇑ 0 wk 3 ; ⇑ 2 id where id is the local identity substitution and wk i : Γ 0 ; ∆ 0 ; ∆ 1 ; (Γ 1 , Γ 2 , Γ 3 ) ⇒ Γ i are appropriate local weakenings.We break down this representation: We add an offset 3 and a local weakening wk 1 , forming ε; id; the offset associated with wk 2 is 0, no context is added to the domain stack.This effectively represents fusion.wk 2 is similar to wk 1 .(iv) The rest of the K-substitution proceeds similarly.Subsequently, we may simply write − → σ ; σ instead of − → σ ; ⇑ 1 σ.In particular, we will write − → σ ; wk instead of − → σ ; ⇑ 1 wk and − → σ ; id instead of − → σ ; ⇑ 1 id.We often omit offsets that are 1 for readability.

Representing MoTs
Now we show that MoTs are a special case of K-substitutions.Let l := | − → ∆|.We define modal weakenings as where the offset n + 1 on the right adds Γ Fusion is also easily defined: where the offset 0 associated with wk 2 allows us to fuse Γ 1 and Γ 2 , and wk i :

Operations on K-Substitutions
We now show that K-substitutions are morphisms in a category of context stacks.In order to define composition, we describe two essential operations: 1) truncation ( − → σ | n) drops n topmost substitutions from a K-substitution − → σ and 2) truncation offset (O( − → σ , n)) computes the total number of contexts that need to be dropped from the domain context stack, given that we truncate − → σ by n.It computes the sum of n leading offsets.Let − → σ := − → σ ′ ; ⇑ mn σ n ; . . .
For the operation to be meaningful, n must be less than |∆|.
Truncation simply drops n local substitutions regardless of the offset that is associated with each local substitution.
Similar to truncation of K-substitutions, we rely on truncation of contexts, written as , otherwise the operation would not be meaningful.We emphasize that no further restrictions are placed on n and hence our definitions apply to any of the combinations of Axioms K, T and 4 described in the introduction.

K-Substitution Operation
We now can define the K-substitution operation as follows: In the box case, the recursive call adds to − → σ an empty local substitution.Note that the offset must be 1, since we extend in the box-introduction rule our context stack with a new empty context.
The unbox case for K-substitutions incorporates MoTs.Instead of distinguishing cases based on the unbox level n, we use the truncation offset operation to re-compute the UL and the recursive call t[ − → σ | n] continues with − → σ truncated, because t is typed in a shorter stack.
Due to the typing invariants, we know that O( − → σ , n) is indeed defined for all valid UL n in all our target systems.This fact can be checked easily.In System K, since n = 1 and all MoT offsets in − → σ are 1, we have that O( − → σ , n) = 1.In System T where n ∈ {0, 1} and − → σ only contains MoT offsets 0 and 1, we have The following lemma shows that K-substitutions are indeed the proper notion we seek:

Categorical Structure
We are now ready to organize K-substitutions into a category.First we define the identity K-substitution: where id's are appropriate local identities.We again omit the offsets when we extend K-substitutions with id, since they are 1.We also omit the subscript − → Γ on − → id for readability.Composition is defined in terms of the K-substitution operation: where σ[ − → δ ] iteratively applies − → δ to all terms in σ.In the recursive case, we continue with a truncated K-substitution − → δ | n and recompute the offset.Verification of the categorical laws is then routine: Theorem 3.3 Context stacks and K-substitutions form a category with identities and composition defined as above.

Properties of Truncation and Truncation Offset
Finally, we summarize some critical properties of truncation and truncation offset.Let − → σ : These properties will be used in Sec. 4. Later we will define other instances of truncation and truncation offsets but all these instances satisfy properties listed here.Therefore, these properties sufficiently characterize an algebra of truncation and truncation offset.

Normalization: A Presheaf Model
In this section, we present our NbE algorithm based on a presheaf model.Once we determine the base category, the rest of the construction is largely standard following Altenkirch et al. [5] with minor differences, which we will highlight.
To construct the presheaf model, we first determine the base category.Then we interpret types, contexts and context stacks to presheaves and terms to natural transformations.After that, we define two operations, reification and reflection, and use them to define the NbE algorithm.Last, we briefly discuss the completeness and soundness proof.The algorithm is implemented in Agda [22].

Kripke-style Weakenings
In the simply typed λ-calculus (STLC), the base category is the category of weakenings.In λ → , we must consider the effects of MoTs and we will use the more general notion of Kripke-style weakenings or K-weakenings which characterizes how a well-typed term in λ → moves from one context stack to another.
The q constructor is the identity extension of the K-weakening − → γ , while p accommodates weakening of an individual context.These constructors are typical in the category of weakenings [5, Definition 2].To accommodate MoTs, we add to the category of weakenings the last rule which transforms a context stack.In the last rule, the offset n is again parametric, subject to the same UL as the syntactic system, and its choice determines which modal logic the system corresponds to.Note that we also write − → id for the identity K-weakening.Following our truncation and truncation offset operations for K-substitutions in Sec. 3, we can easily define these operations together with composition also for K-weakenings.We omit these definitions for brevity and we simply note that a truncated K-weakening remains a K-weakening.Now we obtain the base category:

7-10
Categorical Normalization Proof for Modal Lambda-Calculus where the offset n depends on UL) Fig. 2. Interpretations of types, contexts and context stacks to presheaves Lemma 4.2 K-weakenings form a category W .

Presheaves
The NbE proof is built on the presheaf category W over W . W has presheaves W op ⇒ Set as objects and natural transformations as morphisms.We know from the Yoneda lemma that two presheaves F and G can form a presheaf exponential F − →G: As a convention, we use subscripts for both functorial applications and natural transformation components.As in [5], presheaf exponentials model functions.To model , we define where F is a presheaf.In Kripke semantics, ˆ takes F to the next world.Unlike presheaf exponentials which always exist regardless of the base category, ˆ requires the base category to have the notion of "the next world".This dependency in turn allows us to embed the Kripke structure of context stacks into the base category, so that our presheaf model can stay a moderate extension of the standard construction [5].
With this setup, we give the interpretations of types, contexts, and context stacks in Fig. 2. The interpretation of the base type B is the presheaf from context stacks to neutral terms of type B. We write Ne T − → Γ for the set of neutral terms of type T in stack Γ and Nf T are defined similarly.The case T := ˆ T states that semantically, a value of T is just a value of T in the next world, which implicitly relies on unified weakening's capability of expressing MoTs.⊤ are × are a chosen terminal object and products in W and * is the only element in the chosen singleton set.
The interpretation of context stacks is more interesting.In the step case, − → Γ ; Γ is interpreted as a product.To extract both part of − → ρ ∈ − → Γ − → ∆ , we write (π, ρ) := − → ρ where (n, − → ρ ′ ) := π.The first component, namely π, again consists of two parts: 1) the level n satisfying n < | − → ∆| which corresponds to the MoTs that we support.We note that our definitions again apply to any of the combinations of Axioms K, T and 4 depending on the choice of n. 2) the recursive interpretation of − → Γ in the truncated stack − → ∆ | n described by − → ρ ′ .This stack truncation is necessary to interpret unbox.The second component, namely ρ, describes the interpretation of the top-most context Γ.The fact that our interpretation of context stacks stores the level n ultimately justifies the offsets stored in Ksubstitutions.Hu and Pientka

7-11
where (π, ρ) Functoriality means the interpretations also act on morphisms in W .Given − → γ : We intentionally overload the notation for applying K-substitutions to draw a connection.This notation also applies for morphism actions of Γ and − → Γ .

Evaluation
The interpretation of well-typed terms to natural transformations, or evaluation (see Fig. 3), relies on truncation and the truncation offset.These operations are defined below and follow the same principles that lie behind the corresponding operations for syntactic K-substitutions.
Truncation Offset O( , ) : Most cases in evaluation are straightforward.In the box case the recursion continues with an extended environment and t in the next world.In the unbox case, we first recursively interpret t with a truncated environment and then the result is K-weakened.This is because from the well-typedness of t, we know To obtain our goal •, which is given by − → id ; ⇑ m .The cases related to functions are identical to [5].In the λ case, since we need to return a set function due to presheaf exponentials, we use → to construct this function.We first K-weaken the environment − → ρ and

7-12
Categorical Normalization Proof for Modal Lambda-Calculus then extend it with the input value a.In the application case, since t gives us a presheaf exponential, we just need to apply it to s .We simply supply − → id − → ∆ for the K-weakening argument because no extra weakening is needed.
The following lemma proves that t is a natural transformation in − → ∆: The lemma states that the result of evaluation in a K-weakened environment is the same as K-weakening the result evaluated in the original environment.In STLC, despite being a fact, naturality is not used anywhere in the proof.In λ → , since K-weakenings encode MoTs, naturality is necessary in the completeness proof.
After evaluation, we obtain a semantic value of the semantic type T .In the last step, we use a reification function to convert the semantic value back to a normal form.Reification is defined mutually with reflection in Fig. 3.As suggested by their signatures, they are both natural transformations, but our proof does not rely on this fact.Both reification and reflection are type-directed, so after reification we obtain βη normal forms.We reify a semantic value a of box type T in a context stack − → Γ recursively extending the context stack to − → Γ ; •.Note that a has the semantic type ( ˆ T )− → Γ which is defined as T − → Γ ;• .In the case of function type S −→ T , since a is a presheaf exponential, we supply a K-weakening and a value, the result of which is then recursively reified.
Reflection turns neutral terms into semantic values.We reflect neutral terms of type T recursively and incrementally extending the context stack with one context at a time.In the function case, to construct a presheaf exponential, we first take two arguments − → γ and a.Since v is a neutral term, v[ − → γ ] is also neutral but now well-typed in − → ∆.Both recursive calls to reification and reflection then go down to − → ∆ instead.Normalization by evaluation (NbE) takes a well-typed term t in a context stack − → Γ as input, interprets t to its semantic counterpart in the initial environment, and reifies it back.Before defining NbE more formally, we define the identity environment − → Γ − → Γ that is used as the initial environment: Finally we define the NbE algorithm:

Completeness and Soundness
The algorithm given above is sound and complete: Due to space limitation, we are not able to present the whole proof.Fortunately, the proof is very standard [5].To prove completeness, we simply need to prove that equivalent terms always evaluate to the same natural transformation: Proof.Induct on − → Γ ⊢ t ≈ t ′ : T and apply naturality in most cases about .
The soundness proof is established by a Kripke gluing model.The gluing model t ∼ a ∈ T − → Γ relates a syntactic term t and a natural transformation a, so that after reifying a, the resulting normal form is equivalent to t: The gluing model should be monotonic in − → Γ , hence Kripke.Again the gluing model is very standard [5].It is worth mentioning that in the T case, the Kripke predicate effectively requires that t and a are related only when their results of any unboxing remains related.We then can move on to prove some properties of the gluing model and define its generalization to substitutions, which eventually allow us to conclude the soundness theorem.Please find more details in our technical report [24].

Adaptiveness
We emphasize that our construction is stable no matter our choice of UL.Hence, our construction applies to all four modal systems, K, T , K4 and S4 that we introduced in Sec. 1 without change.The key insight that allows us to keep our construction and model generic is the fact that K-substitutions, K-weakenings, and − → Γ are instances of the algebra formed by truncation and truncation offsets and satisfy all the properties, in particular identity and distributivity, listed at the end of Sec. 3.More importantly, all the truncation and truncation offset functions are defined for all choices of UL thereby accommodating all four modal systems with their varying level of unboxing.

Contextual Types
In S4 and a meta-programming setting, is interpreted as stages, where a term of type T is considered as a term of type T but available only in the next stage.However, as pointed out in [13,30], only characterizes closed code.Nanevski et al. [30] propose contextual types which relativize the surrounding context of a term so representing open code becomes possible.However, this notion of contextual types is in the dual-context style and how contextual types can be formulated with unbox and context stacks remains open.In this section, we answer this question by utilizing our notion of K-substitutions.

Typing Judgments and Semi-K-substitutions
With contextual types, we augment the syntax as follows: − → ∆ ⊢ t⌉ is the constructor of a contextual type, where the contexts that it captures are specified.⌊t⌋ ⇀ σ is the eliminator.Instead of an unbox level, we now require a different argument ⇀ σ , which is a semi-K-substitution storing unbox offsets and terms.We will discuss more very shortly.

7-14
Categorical Normalization Proof for Modal Lambda-Calculus The introduction rule for contextual types is straightforward: •, then we recover .If we let − → ∆ = ǫ; ∆ for some ∆, then we have an open term t which uses only assumptions in the same stage.If − → ∆ has more contexts, then t is an open term which uses assumptions from previous stages.We can also let − → ∆ = ǫ.In this case, ⌈ǫ ⊢ T ⌉ is isomorphic to T and is not too meaningful but allowing so makes our formulation mathematically cleaner.
The elimination rule, on the other hand, becomes significantly more complex: It is no longer enough to eliminate with just an unbox level because the eliminator must specify how to replace all variables in − → ∆ and how contexts in − → Γ and − → ∆ relate.This information is collectively stored in a semi-K-substitution ⇀ σ (notice the semi-arrow), which intuitively is not yet a valid K-substitution, but close: Compared to K-substitutions, semi-K-substitutions differ in the base case, where empty ε is permitted, so they are not valid K-substitutions.However, if a semi-K-substitution is prepended by an identity K-substitution, then the result is a valid K-substitution.Also, O( ⇀ σ ) computes the sum of all offsets in ⇀ σ : We can prove the following lemma: This lemma is needed to justify the β equivalence rule which we are about to discuss.

Equivalence of Contextual Types
Having defined the introduction and elimination rules, we are ready to describe how they interact.Note that the congruence rules are standard so we omit them here and only describe the β and η rules: Hu and Pientka

7-15
In the η rule, ⇀ id denotes the identity semi-K-substitution, which is defined as We omit the subscript whenever possible.Both rules are easily justified.In the β rule, since t is typed in the context stack − → Γ | O( ⇀ σ ); − → ∆, we obtain a term in − → Γ by applying − → id ; ⇀ σ due to Lemma 5.2.In the η rule, by definition, we know ( T .In an extensional setting, where the constructor and the eliminator of modalities are congruent as done in this paper, we can show that the contextual type ⌈ǫ; we view contexts ∆ i as iterative products.This implies introducing contextual types does not increase the logical strength of the system and the system remains normalizing.Nevertheless, contextual types given here seem to have a natural adaptation to dependent types and set a stepping stone towards representing open code with dependent types and therefore a homogeneous, dependently typed meta-programming system.

Modal Type Theories
There are many early attempts to give a constructive formulation of modal logic, especially the modal logic S4 starting back in the 1990's [8,7,6,3,10,29].Pfenning and Davies [12,32] give the first formulation of S4 in the dual-context style where we separate the assumptions that are valid in every world from assumptions that are true in the current world.This leads to a dual-context style formulation that satisfies substitution properties and has found many applications from staged computation to homotopy type theory (HoTT).For example, Shulman [36] extends idempotent S4 with dependent types, called spatial type theory and Licata et al. [28] define crisp type theory, which removes the idempotency from spatial type theory.However, both papers do not give a rigorous justification of their type theories.Most recently Kavvos [26] investigates modal systems based on this dual-context formulation for Systems K, T , K4 and S4 as well as the Löb induction principle.Kavvos also gives categorical semantics for these systems.
However, it has been difficult to develop direct normalization proofs for these dual-context formulations, since we must handle extensional properties like commuting conversions (c.f.[26,16]).Further, our four target systems have very different formulations in the dual-context style as shown by Kavvos [26].As a consequence, it is challenging to have one single normalization algorithm for all our four target systems.
An alternative to the dual-context style is the Fitch-style approach pursued by Clouston, Birkedal and collaborators (see [11,21,9]).At the high-level, Fitch-style systems also model the Kripke semantics, but instead of using one context for each world, the Fitch style uses a special symbol (usually ) to segment one context into multiple sections, each of them representing one world.Variables to the left of the rightmost are not accessible.Our normalization proof and the generalization of λ → to contextual types also can likely be adapted to those systems.
Clouston [11] gives Systems K and idempotent S4 in the Fitch style and discusses their categorical semantics.Gratzer et al. [21] describe idempotent S4 with dependent types.Birkedal et al. [9] give K with dependent types and formulate dependent right adjoints, an important categorical concept of modalities.Gratzer et al. [19,20,18] proposes MTT, a multimode type theory, which describes interactions between multiple modalities.Though MTT uses to segment contexts, we believe that MTT is better understood as a generalization of the dual-context style and is apparent in the let-based formulation of the box elimination rule.This different treatment of the box elimination also makes it less obvious how to understand λ → as a subsystem of MTT.

7-16
Categorical Normalization Proof for Modal Lambda-Calculus Currently, existing Fitch-style systems mostly consider idempotent S4 where T is isomorphic to T .However, we consider this distinction to be important from a computational view.For example, in multi-staged programming (see [32,13]) T and T describe code generated in one stage and two stages, respectively.Moreover, unbox 0 t is interpreted as evaluating and running the code generated by t.It is nevertheless possible to develop a non-idempotent S4 system using unbox levels n in the Fitch style by defining a function which truncates a context until its n'th .This is however more elegantly handled in λ → , because worlds are separated syntactically.For this reason, we consider λ → as a more versatile and more suitable foundation for developing a dependently typed meta-programming system.In particular, our extension to contextual types shows how we can elegantly accomodate reasoning about open code which is important in practice.
Though context stacks in λ → are taken from Pfenning, Wong and Davies' development [32,13], Borghuis [10] also uses context stacks in his development of modal pure type systems.The elimination rules use explicit weakening and several "transfer" rules while λ → incorporates both using unbox levels, which we consider more convenient and more practical from a programmer's point of view.Martini and Masini [29] also use context stacks.Their system annotates all terms with a level which we consider too verbose to be practical.

Normalization
For the dual-context style, Nanevski et al. [30] give contextual types and prove normalization by reduction to another logical system with permutation conversions [14].This means that the proof is indirect and does not directly yield an algorithm for normalizing terms.Kavvos [26] gives a rewriting-based normalization proof for dual-context style systems with Löb induction.Most recently, Gratzer [18] proves the normalization for MTT.It is not clear to us whether techniques in [18] scale to dependently typed Kripke-style systems, as the system have different treatment of the box elimination.
There are two recent papers closely related to our work: Valliappan et al. [37] and Gratzer et al. [21].[37] gives different simply typed formulations in the Fitch style for all four subsystems of S4 and as a result, a different normalization proof must be given to each subsystem individually.Gratzer et al. [21] follow Abel [1] and give an NbE proof for dependently typed idempotent S4.Since the proof in [21] is parameterized by an extra layer of poset to model the Kripke world structure introduced by , as pointed out in [19], this proof cannot even be easily adapted to dependently typed K (see Birkedal et al. [9]).Compared to these two papers, our model is a moderate extension to the standard presheaf model, requiring no such extra layer and adapting to multiple logics automatically, and we are confident that it will generalize more easily to the dependently typed setting.The ultimate reason why we only need one proof to handle all four subsystems of S4 is that we internalize the Kripke structure of context stacks in the presheaf model.The internalization happens in the base category, where MoTs are encoded as part of K-weakenings.The internalization captures peculiar behaviours of different systems and conflates the extra Kripke structure from context stacks and the standard model construction, so that the proofs become much simpler and closer to the typical construction.

Conclusion and Future Work
In this paper, we present a normalization-by-evaluation (NbE) algorithm for the simply-typed modal λcalculus (λ → ) which covers all four subsystems of S4.The key to achieving this result is our notion of K-substitutions which provides a unifying account for modal transformations and term substitutions and allows us to formulate a substitution calculus for modal logic S4 and its various subsystems.Such calculus is not only important from a practical point of view, but play also a central role in our theoretical analysis.Using insights gained from K-substitutions we organize a presheaf model, from which we extract a normalization algorithm.The algorithm can be implemented in conventional programming languages and directly account for the normalization of λ → .Deriving from K-substitutions, we are also able to give a formulation for contextual types with unbox and context stacks, which had been challenging prior to our observation of K-substitutions and is important for representing open code in a meta-programming setting.

7-17
This work serves as a basis for further investigations into coproducts [4] and categorical structure of context stacks.We also see this work as a step towards a Martin-Löf-style modal type theory in which open code has an internal shallow representation.With a dependently typed extension and contextual types, it would allow us to develop a homogeneous meta-programming system with dependent types which has been challenging to achieve.

7 - 4 Categorical
Normalization Proof for Modal Lambda-Calculus the necessity modality T , and a base type B. S, T := B | T | S −→ T Types, Typ l, m, n unbox levels or offsets, N x, y Variables, Var

Fig. 3 .
Fig. 3. Evaluation, reification and reflection functions MoTs) require us to relabel the level n associated with the unbox eliminator.This is accomplished by the operation t{n/l}.Assume that t is well-typed in a context stack In the box case, l increases by one, as we extend the context stack by a new world.In the unbox case, we distinguish cases based on the unbox level m.If m ≤ l, then we simply rearrange the ULs recursively in t.If m > l, we only need to adjust the UL and do not recurse on t.MoTs satisfy the following lemma: