Sufficient Statistics and Split Idempotents in Discrete Probability Theory

A sufficient statistic is a deterministic function that captures an essential property of a probabilistic function (channel, kernel). Being a sufficient statistic can be expressed nicely in terms of string diagrams, as Tobias Fritz showed recently, in adjoint form. This reformulation highlights the role of split idempotents, in the Fisher-Neyman factorisation theorem. Examples of a sufficient statistic occur in the literature, but mostly in continuous probability. This paper demonstrates that there are also several fundamental examples of a sufficient statistic in discrete probability. They emerge after some combinatorial groundwork that reveals the relevant dagger split idempotents and shows that a sufficient statistic is a deterministic dagger epi.


Introduction
The notion of a sufficient statistic plays an important role in statistics, but a precise definition is hard to find.Informally, it involves a function s : X → Y that expresses a characteristic property about certain parameterised probability distributions p(θ) on X.The statistic provides enough information such that this distribution can be reconstructed from the push-forward of p(θ) along s.The original formulation has been introduced and studied in the 1920s by Ronald Fisher, see [10].
A precise formulation has been given recently by Tobias Fritz as part of his efforts to express basic concepts and results from probability theory and statistics categorically, see [11] and also [12,13].This approach uses Markov categories, which are symmetric monoidal categories, with a final object as tensor unit, and with copiers (studied as 'CD-categories' in [5]).String diagrams provide a powerful and intuitive language for these Markov categories.
Within this setting, the notion of sufficient statistic is formalised as an equation of string diagrams, see [11,Defn. 14.3].It can be read as a (symmetric) adjoint property, more in the style of adjoint matrices then adjoint functors.One of the highlights in [11,Thm. 14.5] is an abstract formalisation (and proof) of the Fisher-Neyman factorisation theorem (see also e.g.[2,Prop 4.10] or [31, §3.3]).It gives a necessary and sufficient condition for the existence of a sufficient statistic, in terms of a split idempotent.Thus, [11] provides a great clarification of what a sufficient statistic is all about, via abstraction.But it does not really deal with examples of sufficient statistics.
The existing descriptions of sufficient statistics in the literature have several disadvantages.
• The examples are mostly in continuous probability theory.There are hardly any illustrations in discrete probability theory, except the one with sums and Poisson distributions that we reproduce in Section 7.This is a pity, because there are several fundamental discrete instances.• Typically in these examples, only the sufficiency condition of the Fisher-Neyman factorisation theorem is established -consisting of the disappearance of the parameter after conditioning.These descriptions stop short of giving the full picture, with the relevant split idempotent that does the real work and produces the adjoint situation.
• These examples do not match the abstract description of [11] -understandably so, given that [11] is a very recent publication.
Thus there is room for a fresh look at sufficiency of statistics, given the clear and abstract reformulation provided recently by [11].That is what the current paper will do.The emphasis is on discrete probability theory, since it already offers ample material.This restriction has the (technical) benefit that we do not have to bother with "almost surely equal", as in continuous probability.Thus, the paper only looks at one particular kind of Markov category, namely the Kleisli category of the discrete distribution monad.This paper elaborates examples, but it does not offer new (category) theory.We like to compare examples of sufficient statistics with examples of adjunctions, like: the forgetful functor from compact Hausdorff spaces to sets has a left adjoint given by ultrafilters.Proving this is quite a bit of work.It offers valuable mathematical insight.
Similarly, we claim that the examples of sufficient statistics that we describe below offer valuable insight in the relevant mathematical structures.As is often the case in discrete probability theory, the nature of these structures is combinatorial.Indeed, this paper develops some new combinatorial results, especially about (multiset) partitions, see e.g.Proposition 2.4 below (extending [20]).The emphasis is on uncovering the relevant split idempotents.
We recall that a split idempotent is a map f : A → A that can be written as f = s • r, where r • s = id.In such a situation, the map r is called a retraction and s a section.It is easy to see that the section s is the equaliser of f, id : A ⇒ A, and that the retraction r is their coequaliser.Such a splitting of f , if it exists is unique, up-to-isomorphism.Split idempotents are used in the Karoubi envelope, the idempotentsplitting completion of a category.Splittings of dagger (self-adjoint) idempotents form classical objects in a quantum setting [29].Here we also have daggers, relative to a prior distribution.We show that a sufficient statistic (in discrete probability) is a deterministic dagger epi (see [14]), as retraction part of a split dagger idempotent, see Lemma 2.3.
The paper starts with a relatively long section on background material, covering multisets, discrete probability distributions, channels and their daggers, updating, and partitions.These partitions are special multisets over the natural numbers that have been studied in their own right [1], but also to capture mutations in population biology (see e.g.[8,9,25,26,27,32]; we offer some new result.Section 3 repeats the description of sufficient statistics from [11] and puts it in perspective, in relation to disintegration.The role of split idempotents is emphasised to obtain examples of sufficient statistics.The subsequent four sections 4 -7 cover particular illustrations of the notion of sufficient statistic.Section 4 looks at accumulation (from sequences to multisets) as sufficient statistic for independent and identically distributed (iid) elements in sequences.This sufficiency situation, in Diagram 16, captures the very basic relations at the heart of discrete probability theory.Section 5 introduces a new sufficiency situation for partitions, building on the multiplicity count function (from [20]).Sections 6 and 7 review two examples from the literature in the current setting, with the adjoint description based on split idempotents.

Background
We briefly review the essence of multisets, distributions and partitions, and also of channels as probabilistic functions.The main goal is to fix notation.

Multisets
A multiset (or bag) is a finite 'subset' in which elements may occur multiple times.There are two equivalent representations for a multiset with elements from a set X.
We switch between these representations whenever convenient.
We write M(X) for the set of multisets over X.It is the free commutative monoid on X.This M forms a monad on the category of sets.We only mention how functoriality works: for a function f : X → Y there is a function M(f ) : M(X) → M(Y ) defined by: With a multiset ϕ we associate the following numbers.
For a finite set A we write | A| ∈ N for its number of elements and Perm(A) for the set of permutations π :

Distributions
A (discrete finite probability) distribution over a set X is a formal sum i r i | x i , where x i ∈ X and r i ∈ [0, 1] with i r i = 1.Alternatively, it is a function ω : X → [0, 1] with finite support and x ω(x) = 1.We write D(X) for the set of distributions over X.Such distributions may also be called states, see [16].This D is a monad too, like M. Functoriality works as in (1).For two distributions ω ∈ D(X) and ρ ∈ D(Y ) one can form a product distribution ω ⊗ ρ ∈ D(X × Y ) via (ω ⊗ ρ)(x, y) = ω(x) • ρ(y).Equivalently, ω ⊗ ρ = x,y ω(x) • ρ(y) x, y .For a single distribution ω ∈ D(X) we write the K-fold (tensor) product as: The abbreviation 'iid' stands for 'independent and identically distributed'.Marginalisation of a 'joint' state τ ∈ D(X 1 × • • • × X n ) is the operation of projecting τ to a state in D(X i ), on one of the components, obtained via functoriality, as D(π i )(τ ).We shall make frequent use of multinomial distributions, which we describe via a function mn[K] : D(X) → D M[K](X) .We think of X as a set of colours and of ω ∈ D(X) as an urn with coloured balls, where ω(x) ∈ [0, 1] is the probability of drawing a ball of colour x ∈ X.Then, mn[K](ω) ∈ D M[K](X) is a distribution on multisets of size K, corresponding to draws of K-many balls from the urn (with replacement).This distribution is given by: See [18,23] for more details.As illustration, we consider draws of size 3 with a distribution In the previous subsection we have seen the accumulation function acc : X K → M[K](X).In the other direction there is a channel arr : M[K](X) → X K , called arrangement (see [18]); it is given by: This uniform distribution is well-defined, by Lemma 2.1 (i).We shall see that accumulation and arrangement form a split idempotent, for a special 'Kleisli' composition, see Lemma 2.2.We introduced distributions ω ∈ D(X) with finite support.In Section 7 we drop this finiteness requirement and shall work with more general discrete probability distributions.We then write D ∞ (X) for the set of such distributions.

Channels
We have mentioned that taking distributions D is a monad.The associated 'Kleisli' maps X → D(Y ) are called channels and will be written as X → Y , with a circle on the shaft.These arrows → are the morphisms in the (symmetric monoidal) Kleisli category K (D) of the distribution monad D. We recall the basic operations, of Kleisli extension = and Kleisli composition • • .
Given a channel c : X → Y and a distribution / state ω ∈ D(X) we write c = ω ∈ D(Y ) for the state transformation along c, obtained via Kleisli extension: where ∆ : X → X × X is the copy function ∆(x) = (x, x).We shall see special instances of this tupling of channels when c or d is the identity channel.We shall write such tuples as c, id or id, d .We spell out the associated state transformation: Each function f : X → Y gives rise to a deterministic channel ‹f › := η • f : X → Y , via the unit η of the monad, so that ‹f ›(x) = 1 f (x) .These deterministic channels, of the form ‹f ›, can be characterised as those channels that commute with copyiers ∆.They satisfy: ‹g› Often we omit the brackets ‹-› when it is implicitly clear that an ordinary function is promoted to a channel.We do so in particular for projection and copy functions π i and ∆, and also for accumulation in the next result.
For convenience and clarity, we shall use the graphical notation of string diagrams for channels, as probabilistic computations, where information is flowing upwards.A channel is represented as a box, and wires are typed by the input and output sets of these channels.We assume a basic level of familiarity with these string diagrams and refer to [5,11,30] for more information about their use in probability theory.In this paper these string diagram are interpreted in the Kleisli category K (D), or K (D ∞ ), but not in general Markov categories.

Predicates, updating and daggers
A predicate on a set X is a function p : X → [0, 1].We shall write Pred(X) for the set of all such (fuzzy) predicates on X. Updating (or conditioning) of a state/distribution is a basic operation in probabilistic reasoning, where ω ∈ D(X) is updated in the light of evidence given by a predicate p ∈ Pred(X).We first write ω |= p := x ω(x) • p(x) for the validity (expected value) of p in ω.If this validity is non-zero, we can define the updated distribution ω| p ∈ D(X) as the normalised product: One can then prove that updating makes the predicate p 'more true', in the sense that ω| p |= p ≥ ω |= p, see e.g.[17,18] for details.Along a channel c : X → Y one can do backwards predicate transformation: for a predicate q : Y → [0, 1] on Y we can form a predicate c = q on X via: There is then a basic equality of validities: Given a channel c : X → Y and a 'prior' state ω ∈ D(Y ) we can obtain a reversed channel c † ω : Y → X, with ω as prior distribution, called the dagger of c, see [7,5,11].Its formulation is: This definition uses the point predicate 1 y : Y → [0, 1] which is 1 on y ∈ Y and 0 everywhere else.The definition only works when the pushforward distribution c = ω has full support.The dagger is also called Bayesian inversion; it corresponds to turning a conditional probability p(y|x) into p(x|y).Daggers of deterministic channels play a special role; they give rise to dagger idempotents.
Lemma 2.3 Let f : X → Y be a function, considered as a (deterministic) channel, and let ω ∈ D(X) be a distribution such that f = ω = D(f )(ω) has full support.Then: This makes f a deterministic dagger epi, with prior ω, and Proof.We only do the middle equation and leave the other equations to the reader.

Multiset partitions
A multiset partition, or simply a partition3 , is a multiset over the non-negative natural numbers N >0 .Such a partition i n i |i ∈ M(N >0 ) has a sum (or mean, or average) defined as: Thus, MP(K) is the set of partitions with sum K.The sizes MP(K) of the sets of partitions with sum K = 1, 2, 3, . . .are given by the so-called partition numbers [1]: 1, 2, 3, 5, 7, 11, 15, 22, 30, 42, . . .for which no closed-form expression is known.For instance, MP(4) contains the following five multisets.
These partitions can be seen as the different ways to describe the sum 4 via coins: 4 coins with value 1, 2 coins of 1 and 1 of 2, etc, see [20].Such partitions are studied for instance in population biology, see e.g.[8,9,25,26,27] and also Section 6.In economics they can be used for (un)fairness [28]: if you have 4 units of 'wealth' that you can distribute over 4 people, you can given each of them one unit (as on left in ( 7)) or you can give all units to one individual (as on the right in (7)).The partitions in the middle are intermediate forms of fairness.Notice that the number sum(ϕ), for ϕ ∈ M(N >0 ), is typically different from its size ϕ .We do have ϕ ≤ sum(ϕ).For instance, the partitions in (7) all have sum four, but their sizes are, in order, 4, 3, 2, 2, 1.
The next result involves a multiplicity count function mc, introduced in [20] (and used also in [19,24]), that plays a fundamental role in the sequel -as analogue of accumulation acc.It counts the multiplicities This expresses that in the above (argument) multiset over {a, b, c, d}, 1 element occurs 1 time (namely b), 1 element occurs 2 times (namely d) and 2 elements occur 3 times (namely a and c).Thus, the multiplicity count function abstracts away from the elements in a multiset and only looks at how many elements occur how many times.It is a dagger epi, see Section 5.
The next result collects some combinatorial results about the relation between multisets and partitions, via this new multiplicity count function mc.The focus is on describing mc as the retraction part of a split idempotent.
This means: M (iii) Thus, we can define a stack channel stk : MP(K) → M[K](X) via uniform distributions as: Then: mc for the resulting split idempotent in K (D), where ep stands for element permutation ("substitution").It satisfies, for ϕ ∈ M[K](X), The first equation holds by definition; the second one is the statement.
(ii) By assumption, the set X has n elements.The question is: how many ways are there to partition this set X into subsets with σ(1), σ(2), . . ., σ(K) elements, and finally, with n − σ many remaining elements?As we have seen in Lemma 2.1 (i) this number of partitions is given by the multinomial coefficient: (iii) For a partition σ ∈ MP(K) with sum K, This last equation follows from item (ii).(iv) For ϕ ∈ M[K](X) write σ = mc(ϕ).It suffices to prove: For a permutation π : → X the multiset M(π)(ϕ) obtained by permuting the elements occurring in ϕ, according to π, also has multiplicity count σ = mc(ϕ).Some of these permutations of ϕ are equal to ϕ.We have to count their number.
For each i ∈ N with σ(i) > 0 there are σ(i) many elements of X in ϕ occurring i times.Permuting these σ(i) many elements does not change the multiset ϕ.This can be done in σ(i)! many ways.Using this argument for each i explains the occurrence of σ = i σ(i)! in ( * ).There are n− σ = supp(ϕ) many elements of X that do not occur in ϕ.They can be permuted in (n − σ )! many ways without changing ϕ.
This last item (iv) is somewhat mysterious, so we like to illustrate what is going on.

Sufficient statistic
The diagrammatic formulation of the notion of sufficient statistic has been introduced recently in [11].This formulation involves a simple equation between two diagrams, see (10) below.Implicitly, it relies on disintegration, which we introduce first (based on [11] and also [5]).It helps to understand sufficiency as a form of updating that trivialises, by making a parameter obsolete.In this diagram we use for discarding (marginalisation, projecting away) and for copying.On the right-hand-side of the equation, the X-output of the c channel is discarded, but reconstructed via the d-channel.

Jacobs 9-9
We do not worry about uniqueness of disintegration in the current setting.It does exist, under additional requirements, see [11,5] for details.
We use surjective functions s : X Y to identify elements in X, resulting in abstractions in Y .This Y may give numerical information, e.g. when Y is the set of natural or real numbers.Such a map is also called a statistic.A statistic s is called sufficient for a channel, as statistical model depending on a parameter, when the dependence on the parameter disappears via updating through s.Sufficency refers to being adequate as a summary of essential aspects of the elements in X. Sufficiency involves the existence of a reversal of the function s : X → Y to a channel d : Y → X.This channel d can be used to reconstruct a distribution X, form its summary given by s.We first give a string diagrammatic description, following [11,Defn. 14.3].Subsequently we explain the relation to disintegration.(10) results in a special way from disintegration, namely disintegration of the channel id, s • • c on the left-hand-side in (10).If we write D : A×Y → X for its disintegration channel, as in ( 9), then, we should get an equation as given on the left below.
The distinguishing property of a sufficient statistic is that the dashed line is absent.The situation on the left can be expressed by the equation on the right.(ii) The fact that the dashed wire is missing in (11) is often expressed as: the conditional distribution c(a), given s, does not depend on a.This can be made more precise.The box/channel D in ( 11) can be calculated -in discrete probability -as a distribution D(a, y) ∈ D(X), for a ∈ A and y ∈ Y , via a dagger (see Subsection 2.4).Explicitly, The absence of the dashed arrow in (11) corresponds to the non-dependence of the latter expressions on a.This is the essence of the Fisher-Neyman factorisation theorem, see [11,Thm. 14.5] (iv) Two distributions ω, ρ ∈ D(X) are equal when ω |= p = ρ |= p for all predicates p on X.This easily follows by just looking at point predicates 1 x , where x ∈ X, for which ω |= 1 x = ω(x).This validity formulation of equality of distributions can also be used for the sufficiency equation (10).It amounts to, for all predicates p ∈ Pred(X), q ∈ Pred(Y ), This formulation uses conjunction & for pointwise multiplication of predicates (p . It has a clear adjointness flavour.However, there is no direction involved, or leftright distinction, since & is a commutative operation -in classical probability, not in quantum probability, see e.g.[6,15,33].
All our examples below follow the following pattern to obtain a sufficient statistic.
Lemma 3.4 Consider a channel together with a split idempotent in K (D), consisting of a section followed by a retraction.Then: The latter equation says that the retraction is a sufficient statistic for the channel.
Proof.This is in essence a special case of the Fisher-Neyman factorisation theorem, as formulated in diagrammatic form in [11,Thm. 14.5].It uses that K (D) is a positive Markov category, justifying the last equation below, see [11,Rem.11.23].

Accumulation as sufficient statistic
We can now start harvesting from the previous two sections.
is a sufficient statistic for the identically and independent distribution channel iid[K] : D(X) → X K , via arrangement.This is expressed by the following equation between channels D(X) → X K × M[K](X). iid Proof.Accumulation acc and arragement arr are the retraction and section part of the split idempotent tp := arr • • acc described in Lemma 2.2.Lemma 3.4 applies since: The formulation in ( 16) now follows because, by definition, We think that Equation ( 16) expresses a fundamental and elementary relationship between the combinatorial and probabilistic properties of sequences and multisets, in terms of the idd and multinomial distributions.Although these distributions appear frequently in the literature, this (diagrammatic) equation (16) has not been identified before.There is a single source -as far as we are aware -namely [3, §2.2],where it is mentioned that a particular sum -amounting to accumulation -is sufficient as statistic for iid.But the situation is not elaborated further and arrangement does not occur, like in (16).
Both can be obtained from the above theorem by discarding one of the outgoing wires in (16).However, Equation ( 16) expresses more than these two equations obtained by marginalisation, since it involves a joint state formulation.We elaborate in the concrete situation of Theorem 4.1 on two aspects discussed in Remark 3.3.First, the sufficiency of accumulation can also be expressed (and proven directly) via arbitrary predicates p ∈ Pred X K and q ∈ Pred M[K](X) , in the style of ( 14), as an adjointness property: Next, we illustrate how in this case the dependency on ω disappears in iid if we condition on accumulation (to a fixed multiset ϕ), as in (12): = arr(ϕ).

Multiplicity count as sufficient statistic
Accumulation abstracts away from the order of elements in a sequence, by turning the sequence into a multiset.Multiplicity count mc : M[K](X) → MP(K) is a function that abstracts away from the elements themselves, and only looks at their multiplicities.Recall, mc(ϕ) = x∈supp(ϕ) 1 ϕ(x) .The question arises: is multiplicity count also a sufficient statistic, and if so, for which channel?This section provides an answer, based on a split idempotent, like in Lemma 3.4.Indeed, in Proposition 2. 4 we have seen that multiplicity count mc is the retraction part of the split idempotent ep of element permutations, given by ep = stk • • mc.
We first introduce a swap version of multinomial distribution, defined for a distribution ω ∈ D(X) on a finite set X as: For instance, using the same distribution We see that multisets with the same multiplicity count have the same probability.This happens since the elements in X do not really play a role in this definition.Hence, really, we should not be using distributions, but 'divisions', as element-free distributions, see [20].This is left for future work.
Theorem 5.1 Let X be a finite set and let K ≤ | X |.Multiplicity count is a sufficient statistic for the swapped multinomial, via the stack channel stk, as expressed by the following sufficiency equation between channels D(X) → M[K](X) × MP(K).
Lemma 3.4 now provides a sufficient statistic situation with channel: This composite is what we have called the partition multinomial pamn[K] in (19).
We like to get a better handle on this partition multinomial pamn[K].
Proposition 5.2 Let X be a finite set and let K be a natural number with 1 ≤ K ≤ | X |.
For the second equation fix ϕ ∈ M[K](X).Then, by what we have just shown: (ii) By the previous item: = D mc mn[K](ω) We conclude with an illustration: This channel pamn[K] acts like a (Kingman) paintbox, see [4].
It is known from [9] that the size function is a sufficient statistic, namely for the channel of Ewens distributions, see also [8, §2.2] or [25].We elaborate this situation in terms of Lemma 2.3 -which leads to split dagger idempotents.Due to space constraints, we only look at the main lines here.The appendix contains more details.
The Ewens distribution, from [9], can be described as a channel ew[K] : R >0 → D MP(K) , given by: We show that by updating with a fixed size n, where 1 ≤ n ≤ K, the dependence on the parameter t disappears, and gives us our dagger size † .
The latter Stirling distribution stir[K] involves the Stirling numbers of the first kind K k , see [21] for more details.It is sometimes called the Chinese restaurant table distribution.

Sums of sequences as sufficient statistic
For a fixed number K we consider the addition function sum : N K → N. It is the retraction part of a split idempotent, with section sum † : N → D N K given by: This yields a distribution by Lemma 2.1 (ii) since a sequence k ∈ N with sum(k) = n can be identified with a multiset in M[n]({1, . . ., K}).Our final result is a well known instance of sufficiency, involving the Poisson channel pois : R >0 → D ∞ (N) given by pois[λ] := n e −λ • λ n n! n , with infinite support.The appendix contains a proof.Theorem 7.1 For K ≥ 1, the addition function sum : N K → N is a sufficient statistic for K-fold possion product K pois : R >0 → D N K in the following equality of channels R >0 → N K × N.

Conclusions
This paper elaborates several examples of a sufficient statistic, in the recently introduced 'adjoint' formulation using string diagrams of [11].It does so for discrete distributions, since sufficiency is not well-developed there -in contrast to the continuous case.Four examples are described, following the same pattern, by identifying the relevant split idempotents -that arise via the Fisher-Neyman factorisation theorem.The paper's focus is on concrete mathematical (combinatorial) structure, and not so much on theory development.Ewens' distributions capture probabilistic mutations for multiset partitions.In [19] this approach is extended to other data types -including set partitions, also studied in [24] -and leads to many more examples of sufficient statistics.There is room for further investigations of, for instance.the use of 'divisions' from [20] instead of the 'partition' distributions in Section 5. Next, (iii) By induction on K, using items (i) and (ii).
Proof of Theorem 7.1 For K ≥ 1 we have two D ∞ -channels: In addition, there is the sum sufficient statistic with its dagger sum † in: We choose to prove Theorem 7.1 via the predicate formulation of (14).For predicates p ∈ Pred(N K ) and 9-20 Sufficient Statistics... q ∈ Pred(N), there is an adjoint correspondence:

Lemma 2 . 2
Accumulation and arrangement (4) form (the section and retraction of ) a split idempotent in the Kleisli category K (D) of the distribution monad D, since: acc • • arr = id.The resulting split idempotent channel tp :

Definition 3 . 1
Let c : A → X × Y be a channel, represented as a box on the left below.A disintegration of c is a channel d : A × Y → X that satisfies the equation on the right.

Definition 3 . 2
Let c : A → X be a channel, where we think of A as the space of parameters.(i)A statistic for the channel c is a function s : X → Y .(ii) Such a statistic s is sufficient if there is a channel d : Y → X such that: Remark 3.3 (i) The channel d on the right-hand-side in and [2, Prop 4.10] or [31, §3.3].(iii) In the sequel we are interested in actually demonstrating sufficiency of certain statistics, in discrete probability.Proving Equation 10 amounts to showing an equality of distributions of the following form.For each parameter a ∈ A, id, s = c(a) = x∈X c(a)(x) x, s(x) = x,z∈X c(a)(z) • d s(z) (z) z, s(x) = d, id = D(s) c(a) .

retraction
This result will be exploited in the next four sections.

Jacobs 9 - 11 Theorem 4 . 1
Fix a set X and a number K ∈ N. The accumulation function acc : 4 with split idempotent ep = stk• • acc.We thus have to prove that ep• • swmn[K] = swmn[K].This follows from the fact that swmn[K] = ep • • mn[K].The proof uses that the multinomial Jacobs 9-13 channel mn[K] forms a natural transformation D ⇒ DM[K]