Reduced-Order Models for Coupled Dynamical Systems: Data-driven Methods and the Koopman Operator

Providing efficient and accurate parametrizations for model reduction is a key goal in many areas of science and technology. Here we present a strong link between data-driven and theoretical approaches to achieving this goal. Formal perturbation expansions of the Koopman operator allow us to derive general stochastic parametrizations of weakly coupled dynamical systems. Such parametrizations yield a set of stochastic integro-differential equations with explicit noise and memory kernel formulas to describe the effects of unresolved variables. We show that the perturbation expansions involved need not be truncated when the coupling is additive. The unwieldy integro-differential equations can be recast as a simpler multilevel Markovian model, and we establish an intuitive connection with a generalized Langevin equation. This connection helps setting up a parallelism between the top-down, equations-based methodology herein and the well-established empirical model reduction (EMR) methodology that has been shown to provide efficient dynamical closures to partially observed systems. Hence, our findings support, on the one hand, the physical basis and robustness of the EMR methodology and, on the other hand, illustrate the practical relevance of the perturbative expansion used for deriving the parametrizations.


Introduction and Motivation
Multiscale systems are typically characterized by the presence of significant variability over a large range of spatial and temporal scales. This multiscale character is due to a combination of the following factors: the nature of the external forcings, the inhomogeneity of the properties of the system's various components, the complexity of the coupling mechanisms between the components, and the variety of instabilities, dissipative processes, and feedbacks acting at different scales. In many cases, both the theoretical understanding of such systems and the formulation of numerical models for simulating their properties are based on focusing upon a reduced range of large spatial and long temporal scales of interest, and upon devising an efficient way to capture effectively the impact of the faster dynamical processes acting predominantly in the neglected smaller spatial scales [24,64].
Given a high-dimensional dynamical system, we are thus interested in reformulating it in such a way that only the variables of interest are resolved. A first-guess would be to ignore altogether the unresolved variables and consider uniquely the filtered evolution laws valid for the targeted range of scales. However, this is well-known to be inadequate, because nonlinearity guarantees that the unresolved variables have an impact on the resolved ones, both in terms of the detailed dynamics and of its statistical properties. Therefore, the problem of constructing accurate and efficient reduced-order models -or, equivalently, of defining the coarse-grained dynamics -is an essential and fundamental aspect of studying multiscale systems, both theoretically and through numerical simulations.
For the sake of concreteness, let us consider, the case of climate science. Is It is well-nigh impossible, therefore, given our current scientific knowledge and our available or even foreseeable technological capabilities, to create a numerical model able to directly simulate the climate system in all details for all the relevant time scales, which span a range of over 15 orders of magnitude [30,32,66]. Hence, one has to focus on a specific range of scales through suitably developed, approximate evolution equations that provide the basis for the numerical modeling. Such equations are derived from the fundamental laws of climate dynamics through systematic asymptotic expansions that are based on imposing an approximate balance between the forces acting on geophysical flows. These balance relations lead to removing small-scale, fast processes that are assumed to play a minor role at the scales of interest by filtering out the corresponding waves [38,42,80].
In climate science, parametrization schemes have been traditionally formulated in such a way that one expresses the net impact on the scales of interest of processes occurring within the unresolved scales via deterministic functions of the resolved variables, as in the pioneering work on the parametrization of convective processes by Arakawa and Schubert [2]. More recently, it has been recognized, mostly on empirical grounds, that parametrizations should involve stochastic and non-Markovian components [4,28,62]. Machine learning methods have been proposed as the next frontier of data-driven parametrizations [29,61,68,84], able to deliver a new generation of Earth system models [73]; see, though, the caveats discussed by [39,26].

The Projection Operator Formalism of Mori and Zwanzig
Let us reformulate the problem of constructing parameterizations as a projection of the dynamics onto the set of resolved variables. By working at the level of observables, Mori [60] and Zwanzig [90] showed that the evolution laws for the projected dynamics incorporate a deterministic term that would be obtained by neglecting altogether the impact of the unresolved variables, to which a stochastic and a non-Markovian correction had to be added. A. J. Chorin and coauthors [20,21] played an important role in developing further these ideas and applying them to several important problems. We briefly recapitulate below the Mori-Zwanzig projection operator approach.
Formally, let Φ denote a generic observable defined on a state space viewed as the product of two finitedimensional spaces X × Y, with variables x ∈ X and y ∈ Y being the resolved and unresolved variables, respectively. Next, let us define P to be a projector onto functions depending only on the target variables in X , with the complementary projector on the unresolved variables being defined by Q = Id − P.
Given a smooth flow ψ t on X × Y, we consider its action on smooth observables Φ = Φ(x, y) defined by U t Φ(x, y) = Φ(ψ t (x, y)), (x, y) ∈ X × Y, t ≥ 0. (1.1) The operator U t is the Koopman operator and the family {U t } t≥0 of Koopman operators indexed by time forms a semigroup; see Sec. 2 below for further details. The Koopman operator describes how functions on the phase space change under the action of the flow; its time evolution obeys the following equation: (1. 2) The linear operator L gives the instantaneous rate of change of Φ under the action of the flow ψ t . Representing U t formally as the exponential of L, U t = e tL , is both useful and fully justified by operator semigroup theory [65]. With this representation, L satisfies the following identities where we have employed the Dyson identity [23,25] to obtain Eq. (1.3b), as will be discussed in Sect. 2.1 The first term in Eq. (1.3b) is the contribution of the resolved variables x alone to the instantaneous rate of change of Φ. The second term models the fluctuating effects of the unresolved y-variable by itself, while the third and last term represents, via an integral, the time-delayed influence upon x of its interactions with y. This formal calculation suggests that any closed model for the x-variables should incorporate a fluctuating term to account for the y contributions and a memory or integral term for the y-x interactions. Unfortunately, the Mori-Zwanzig equation (1.3b)-also known as the Generalized Langevin Equation (GLE) [43,63] does not provide explicit analytic formulas to determine each of the three summands in the right-hand side (RHS) of Eq. (1.3b). Hence, we need efficient ways to approximate such an equation.
In the limit of perfect time scale separation between the xand y-variables, the non-Markovian term drops out and the fluctuating term can be represented as a -possibly multiplicative -white-noise term, thus recovering the basic results obtained via homogenization theory [34,58,64]. When no such separation exists, however, one has to resort to finding an integral kernel beyond the abstract formulation of Eq. (1.3b); see, for instance, the theoretical ansatz based on perturbation expansion presented in refs. [88,89] and discussed later in the paper, and refs. [43,83] for concrete applications.
In parallel with the theoretical approaches to approximate the Mori-Zwanzig equation (1.3b), data-driven methods have been proposed to model fluctuations and memory effects arising as a result of projecting a large state space onto (much) smaller subspaces. To this end, the work in [43] provides a rigorous connection between the Mori-Zwanzig equation (1.3b) and multilevel regression models that were initially introduced for climatological purposes in [48]. More recently, ref. [51] proposed Nonlinear Auto-Regressive Moving Average with eXogenous input (NARMAX) models as a data-driven methodology that is comparable with the Mori-Zwanzig formalism and applied such models to the deterministic Kuramoto-Sivashinsky equation and to a stochastic Burgers equation. The complementarity of theory-based and data-driven model reduction methods in the absence of time scale separation is very well documented in [51] as well. A highly complementary recent contribution aimed at finding a common thread between data-driven methods and the Mori-Zwanzig theoretical framework is [52].
Efforts at using ingeniously selected basis functions as a stepping stone in data-driven methods for model reduction, effective simulation with partial data, and even prediction are multiple and flourishing. Thus, the eigenvalues and eigenfunctions of the Koopman and transfer operators have been used to capture the modes of variability of the underlying flow, regardless of the latter being deterministic or stochastic; see, respectively, [3] or [19]. Dynamic mode decomposition [72,DMD] allows one to reconstruct from observations the eigenvectors and eigenvalues of the Koopman operator for observables of interest even in high-dimensional dynamical systems [59,69]. The latter approach is complementary to the one presented herein because we shall use the eigenvectors of the Koopman operator to build the projected dynamics for the observables of interest, which can then be re-written as a multilevel Markovian stochastic model.
Examples of other types of selection of dynamically interesting and effective bases are multichannel singular-spectrum analysis analysis [1,5,31, MSSA] and data-adaptive harmonic decomposition [10,44,. Both methodologies have been used extensively and successfully in the simulation, as well as the prediction, of complex phenomena [31,41,44].
Our main theoretical result, Theorem 2.1 below, establishes how such spectral elements determine in turn the constitutive elements of data-driven non-Markovian closure of partially observed complex systems, when rewritten as a multilevel Markovian stochastic model. This theorem highlights, in particular, new bridges with Koopman modes and the DMD decomposition [59,69,72], as well with other kinds of projections onto spectral bases [10,31,37,47,67,78]. A more thorough discussion of the complementary approaches involved is beyond the scope of this paper.

Goals of This Paper
Many approaches to constructing theoretically rigorous parametrizations have been devised. These can be broadly divided into top-down and data-driven approaches: Top-down approaches aim at deriving the parametrizations by applying suitable approximations to the equations describing the dynamics of the whole systems [35, 57, 83, 86, 88, 89, for instance], while data-driven parametrizations are built by constructing a statistical-dynamical model of the impact of unresolved scales on the scales of interest. In fact, partial observations of a time-evolving system can be used to deduce the fluctuating and delayed effects of the unobserved processes, as shown in [43,45,48,85].
In this paper, we will discuss and compare the properties of the Wouters-Lucarini (WL) top-down parametrization [83,88,89] and of the Empirical Model Reduction (EMR) data-driven parametrization [43,46,45,48,49]. We will also see when and how the integro-differential equation occurring in the WL parametrization can be recast into a set of Markovian stochastic differential equations (SDEs). In other words, we investigate the quasi-Markovianity of the latter parametrization [63].
The two aforementioned methodologies are conceptually and practically distinct, even though the ultimate goal is to provide a computationally practical approximation for the Mori-Zwanzig or GLE integro-differential equation. In other words, both approaches -top-down and data-driven -provide fluctuations in the form of stochastic noise and memory effects determined by an integral kernel. On the one hand, the WL approach assumes prior knowledge about the decoupled hidden dynamics but no information about the statistical properties of the coupled system. The empirical approach, on the other hand, samples the observed variables evolving according to the latter. The structure of the multilevel stochastic models (MSMs) that generalize EMRs [43] allows one, moreover, to derive explicit formulas for the fluctuating and memory correction terms that parametrize the influence of hidden processes.
The overall goal of this paper is to provide a conceptual and analytical link between these two approaches, aiming, on the one hand, to buttress the practical relevance of the WL perturbative approach and, on the other, to provide further insight into the well-documented robustness of the EMR method. Moreover, we will clarify how multilevel systems arise from both the top-down (WL) and the bottom-up (EMR) approaches. The paper explores the complete set of boxes and explains all the arrows in Figure 1. The diagram in the figure shows that, starting from the top box, one can arrive at a memory equation, like (1.3b), via top-down or bottom-up methods, as indicated by the left and right sequence of arrows, respectively.
The paper is structured as follows. Section 2 revisits the derivation of the WL parametrization method by applying the Dyson expansion to the Koopman operator associated with two weakly coupled dynamical systems. We show that such an expansion need not be truncated for additively coupled models and consider more general coupling laws than those in [89].
Furthermore, we study the problem of finding Markovian representations of the memory equation in the WL approach based on the spectral decomposition of the Koopman semigroup. Specifically, Theorem 2.1 shows, in the case of a scalar observable, how to recast the stochastic integro-differential equation arising in the WL parametrization as a multilevel Markovian stochastic system involving explicitly the spectral elements of the (uncoupled) Koopman operator, and we point out in Remark 2.5 how such a Markovianization extends to the multidimensional setting. Section 3 provides new insights into the Markovian representation adopted in the MSM framework; these insights help one to determine, in particular, the number of levels required for EMR to converge. Section 4 presents a comparison of the data-driven and top-down parametrization approaches using a simple conceptual stochastic climate model. Finally, we discuss the conclusions obtained from this investigation in Section 5.
Appendices are included in order to avoid diverting the attention of the reader from the main message of the paper. Appendix A provides the proof of the key Theorem 2.1. Appendix B briefly revisits the spectral decomposition of correlation functions and provides a criterion to quantify the loss of the semigroup property that is useful in identifying non-Markovian effects. Appendix C discusses the stochastic Itô integration of the elementary form of an MSM. Appendix D shows how EMR approaches can capture the dynamical properties of partially observed systems; in it, we consider a simple climate model, obtained by coupling the Lorenz atmospheric model [54, (L84)] and the Lorenz convection model [53, (L63)].

Revisiting the Weak-Coupling-Limit Parametrization
To study dynamical systems in which one can separate the variables into two groups, with weak coupling between the two, one often resorts to so-called parametrizations of the effects of one group on the other. In the weak-coupling limit, the coupling itself can be treated as a perturbation of the main dynamics [56,88,89]. Granted such an assumption and some degree of structural stability of the system, one can apply response theory to derive explicit stochastic and memory terms to describe the impact of the variables we want to neglect on the variables of interest, in the Mori-Zwanzig spirit. Note that, to do so, no assumption on timescale separation between the two groups of variables is necessary. This point is particularly relevant in fields  like the climate sciences, where no clear time-scale separation is observed, so that asymptotic expansions of the kind used in homogenization theory are of limited utility.

Deriving the WL Approximation for the Mori-Zwanzig Formalism
Here, using a perturbative approach, we review the derivation of the parametrization presented in refs. [56,88,89]. Formally, we want to couple two dynamical systems generated independently by two vector fields F : We study a broad class of systems of the form: The operation indicated by the colon x : y denotes the Hadamard product that multiplies vectors or matrices component-wise. Here, four new vector fields have been introduced to model the coupling law, namely C x x : X −→ X , C y x : X −→ Y, C x y : Y −→ X and C y y : Y −→ Y. The real parameter controls the strength of the coupling between the two groups of variables, x(t) and y(t), so that the xand y-variables are uncoupled for = 0. We assume that the vector fields F and G, as well as the coupling laws in Eqs. (2.1a,2.1b) are such that system (2.1) possesses a global attractor. Furthermore, we assume throughout this article that this global attractor supports an physical invariant probability measure µ that describes the distribution of trajectories onto the global attractor.
The WL parametrization views the coupling as an -perturbation of the otherwise independent xand yprocesses, with x the observed and y the hidden variables. One next assumes that the impacts of perturbations applied to these processes can be addressed using response theory [70,71], so that response formulas can be used to derive an effective equation for the x-variables.
Taking the Mori-Zwanzig [60,90] point of view, we wish to calculate the evolution of observables that depend on the observed variables x alone, Φ = Φ(x(t)). The idea, following [89], is to perform a perturbative expansion of the differential operator L governing the evolution of Φ(x(t)) under the action of the flow associated with Eq. (2.1). Denoting by u(x, y, t) the time evolution of a smooth observable Φ in C ∞ (X × Y), the first step of this Dyson-like operator expansion reads as follows: Here L 0 and L 1 account for the advective effects of the uncoupled and coupling terms, respectively, that compose the RHS of Eq. (2.1), namely in Eqs. (2.3), ∇ x and ∇ y denote the vector differential operators with respect to the variables x and y.
Recalling Eq. (1.2), the solution operator of Eq. (2.2) is the Koopman operator. Formally, its dual acts on densities and it is the so-called transfer operator [3,50]. Equation (2.2) is thus a transport equation, where the physical quantity or observable is advected by the vector field on the RHS of Eq. (2.1).
Note that the operator formalism presented here in the deterministic dynamical systems setting -and the associated semigroup theory -extends to Markov diffusion processes driven by a stochastic forcing [19]. In the latter case, the transport equation (2.2) becomes the so-called backward Kolmogorov equation that describes the evolution of the expected value of observables. Loosely speaking, the corresponding extension amounts to adding a Laplacian-like operator to the advection operator L [19,63]. See [36] for the appropriate context in testing the applicability of response theory to the independent x and y processes, when = 0.
More precisely, one associates to the solution u(x, y, t) of Eq. (2.2) unfolding from an observable Φ = Φ(x, y) at time t = 0, a family of linear Koopman operators indexed by time {U t } t≥0 such that u(x, y, t) = U t Φ(x, y), for any t ≥ 0 and (x, y) in X × Y. These operators are defined -as mentioned already in connection with introducing the GLE (1.3b) -as exponentials of the operator L, i.e., U t = e tL . This notation is formal as the operator L is unbounded; it is, however, useable as {U t } t≥0 satisfies the semigroup property, i.e. U t+s = U t U s , t, s ≥ 0, as for a standard exponential. Over the appropriate function space * of observables Φ, this family actually forms a strongly continuous contracting semigroup [25].
The action of the flow on an observable Φ becomes thus more transparent thanks to the operator U t , according to the equation where (x(t; x 0 ), y(t; y 0 )) denotes the system's solution at time t emanating from the initial state (x 0 , y 0 ) at time t = 0. In what follows, we omit the subscript 0 in (x 0 , y 0 ) but still take it as an initial state. The semigroup {U t } t≥0 is known as the Koopman semigroup and for each t, U t is the Koopman operator mentioned above; see also [43,Sec. 4]. When the coupling parameter in system (2.1) is small, one can use formal perturbation expansions of the Koopman semigroup to better isolate and assess the coupling effects at the level of observables. To do so, we follow here the perturbation expansion first introduced by Freeman * Such a space can be chosen, for instance, as Dp = {Φ ∈ L p µ (X × Y) | AΦ = limt→0 t −1 (UtΦ − Φ) exists} for some p ∈ [1, ∞], with µ denoting a relevant invariant measure of the system (2.1) supported by the global attractor, while the limit is taken in the sense of strong convergence [25]. J. Dyson in the context of quantum electrodynamics [23] and later formulated rigorously in mathematical terms in [33]. Formally, this expansion reads as follows: and it yields the following expansion of the Koopman operator in : This identity shows that the evolution of a generic observable can be described as an -perturbation of its decoupled evolution according to L 0 . We note that these expansions are purely formal and in particular it is not clear in which sense this expansion might converge. For a bounded perturbation operator L 1 , it would be straightforward to prove boundedness of the resulting perturbed semigroup. However, L 1 here is a differential linear operator, for which direct estimates are more laborious. Leaving alone the functional analysis framework that would make such an expansion rigorously convergent, we shall use nevertheless the expansion (2.6) throughout this article.
The objective now is, using this operator expansion, to derive an effective reduced-order model for the evolution of the x-variable without having to resolve the y-process. We start observing the system at t = 0, but assume that it has already attained a steady state. Since we are only concerned with observables depending solely on the x-variables, we formulate now an evolution equation for such observables. To do so, we consider first the Liouville equation for a generic y-independent observable Φ, this is, Φ(x, y) = Φ(x), for every x and y.
For such an observable, at the time we start observing the coupled system, Eq. (2.2) reduces to where · denotes an inner product in X . Equation (2.7) illustrates the trivial fact that the time evolution in Eq. (2.1) of an x-dependent physical quantity is also affected by the y-variables. Following [88,89], the decoupled equations are assumed to have been evolving for some time prior to the coupling. Hence, we have to formally parametrize the evolution of the C x y (y)-contribution to the vector field which is, ultimately, a vector-valued observable.
We do so by introducing an extended version of the Koopman operators that act on vectors componentwise, rather than just on real-valued observables. Consider v : X × Y −→ R d , for some positive integer d, and define the action of the Koopman operator e tL on v as: for every i = 1, . . . , d. The definition (2.8) will allow us to use the semigroup notation for observables of possibly different dimensions, all of which take their inputs in the phase space X × Y . Ultimately, this is a component-wise evaluation of our extended Koopman operator family, and its generator can be obtained analogously. As mentioned above, we have to model the effects of the coupling vector field C x y (y), whose state at time t = 0 is the product of the evolution from time −t to 0. We then have, with the dynamics starting at time −t and initial state (x 0 , y 0 ), C x y (y) = e tL C x y (y 0 , −t) = e tL C x y (y 0 ) . (2.9) Now, by using the perturbative expansions in Eqs. (2.5b)-(2.6), we obtain: Plugging the indentity in Eq. (2.10c) into Eq. (2.7, we find the following expression: This equation is an exact reformulation of the problem induced by Eq. (2.7). This reformulation demonstrates that memory effects enter at second order in powers of the coupling parameter. Notice, though, that even if Eq. (2.11) reduces the dimensionality of the problem from d 1 +d 2 to d 1 , it does not constitute an approximation for the evolution of Φ as an observable of x alone, since it depends on the evolution of the y-variables in the coupled regime by means of the action of e sL onto L 1 . Therefore, we need to perform a further approximation by considering Eq. (2.10d) instead, which leads to: where the terms of order 3 have been dropped. Equation (2.12) is our equivalent of the Dyson approximation in quantum electrodynamics; it approximates the evolution of the x-variables with no need for the evolution of the y-variables in the coupled regime. This result amounts to saying that -by observing only the statistical properties of the decoupled dynamics of the y-process, obtained with = 0 -one can construct a Markovian contribution and a non-Markovian contribution to the dynamics of the x-variables that is able to describe the impact the the coupling. Expanding the kernelK of the memory contribution, we get: Note that the leading-order Koopman operator e sL 0 models the evolution of the observables in the uncoupled regime. Since there is no prior knowledge on initializing the coupled system at time −t, the initial state y 0 in the hidden variables should be drawn from an ensemble, according to a probability density function. At this stage, there is freedom in the choice of such a prior. However, since we are assuming that the coupled system was initialized at time −t, it is natural to draw y 0 according to the invariant measure ν associated with the dynamical system generated by the vector field G from Eq. (2.1).
We wish to sample initial conditions from the coupled steady state, but do not assume any prior knowledge of the coupled statistics. As discussed in [88,89], we can take advantage of response theory to address this situation. Indeed, for any sufficiently smooth observable Ψ, we have: where Ψ is the expectation value of Ψ in the coupled system (2.1), Ψ =0 is the expectation value of Ψ according to the statistics generated by the uncoupled y process obtained by setting = 0 in Eq. (2.1b), and k δ k [Ψ] is the k th -order response. In what follows, we remove the subscripts for the averages when = 0. Therefore, we have that the expected value of the coupling function reads as: (2.14) Likewise, we can calculate the average of such function at time t: Now, by lettingη(t, y 0 ) = e tL 0 C x y (y 0 ), we find that in order for the approximate statistics to agree up to second order in with the exact one, we only need to impose the following conditions upon the first two moments of the parametrized noisy fluctuations (see also [88]): here (·) stands for the transpose of a vector or a matrix. It follows that any stochastic noise η(t) that satisfies the two conditions above will be suitable for parametrizing the fluctuations in the y-dynamics tied to the lack of knowledge in the initial state. Each of the entries in the correlation matrix given by Eq. (2.16b) is the correlation function between the components of the vector field C x y and these will become explicit provided a suitable spectral decomposition is at hand. Such a decomposition will be provided later in Sect. 2.3, although the reader is referred at this point to Appedix B for clarity.
In the memory term, though, we neglect -corrections to its statistics since memory effects are of order 2 already. Thus, we have, by averaging the kernelK(t, s, x 0 , y 0 ) in Eq. (2.13b) with respect to the y-variables, This way, the memory kernel only depends on the x-variables. Hence, we find a self-consistent evolution of the x-variables, subject to the influence of unobserved variables y, in the form of a stochastic integro-differential equation (SIDE) resembling the GLE (1.3b): where η(t) is a stochastic forcing that agrees with the mean and correlation properties stated in Eq. (2.16). We emphasize that the solution x(t) of the original system of ordinary differential equations (2.1) does not satisfy Eq. (2.18): it is just the proposed reduced-order model for the x-variables. The closure provided by expressing the corrections in the second and third term on the RHS of Eq. (2.18) as functions of x alone is typically called a parametrization of the effect of the unobserved y-variables in the climate sciences [32].
Note that there is considerable freedom in the choosing the noise, since we only require agreement up to the second moment. However, a direct consequence of this weak-coupling parametrization is that realizations of the noise can be produced by directly integrating the decoupled hidden variables, or by representing it using simple autoregressive models [82]. We are assuming here that the uncoupled dynamics leads to a noisy signal; this can be achieved either by the presence of stochastic forcing in the hidden variables [82,87] or by their uncoupled dynamics being chaotic [83].
To summarize, the weak-coupling limit allows one to develop a parametrization of the hidden variables for a system of coupled equations where no separation of time scales is assumed. Moreover, this approach provides explicit approximate expressions for the deterministic, stochastic, and non-Markovian terms in the Mori-Zwanzig formalism of Eq. (1.3b).
There are two sources of error in the parametrization proposed in Eq. (2.18). First, the truncation performed in the Dyson expansion neglects higher-order effects, which are weighted by the third power of the small coupling parameter. Secondly, averaging over the statistics of the uncoupled dynamics can also introduce errors. Furthermore, the nature of the stochastic correction is not fully determined except for its lagged correlation.
The perturbation operator approach taken here is analogous to that of [89], who only considered the independent or additive-coupling cases; the latter is expanded upon in Sect. 2.2. Here, though, we generalize further the parametrization formulas that can be obtained via perturbative expansion of linear operators. In fact, the present approach can also be extended to weakly coupled systems of the form: where C y encodes interactions that need not be separable between the xand y-variables in the hidden layer of the model. Note that the full parametrization of arbitrary couplings was discussed by the two authors of [89] in previous work [88], in which they used a response-theoretic approach.

The Additive-Coupling Case
The approximate Dyson expansion given in Eq. (2.12) is exact in the case of additive coupling. Such systems take the form:ẋ Indeed, letting C y (x, y) = C y (x) in Eq. (2.19b) and using Eq. (2.10b) allows us to avoid the truncation of the Dyson expansion and yields the following expression for the memory term: which is exact. Next, taking averages with respect to ν, we obtain: Hence, the parametrization in this additive-coupling case is exact, as no terms proportional to k , k ≥ 3 are present. The only assumption made is that the statistics in the y variables have reached a steady state according to the unperturbed system. Finally, the full SIDE in this case takes the form: where the stochastic process η has the mean and correlation properties given by Eqs. (2.16). This equation is, thus, exactly the one obtained in [89]. Memory effects represented by integral terms seem unavoidable unless the memory kernels vanish quickly with respect to time. Infinite time scale separation between the two sets of variables leads, though, to the vanishing of the associated integral expressions [64]. Here, we are not assuming no such property in the coupled dynamical system under study; see Eqs. (2.1) and (2.20). On the other hand, reduced phase spaces can help explain the statistics of the dynamical system without resorting to delayed effects that entail the integrals in Eqs. (2.18) and (2.23). Following [19], we briefly review in Appendix B a criterion based on Koopman operators -and, more generally, Markov operators -that enables one to decide whether memory effects can help explain the dynamics and statistics in reduced phase spaces.

Markovian Representation through Leading Koopman Eigenfunctions
In the context of Langevin dynamics, there are known conditions on memory kernels that allow one to recast certain stochastic integro-differential equations into a Markovian SDE by means of extended variables; see [63,Sec. 8.2]. The stochastic processes that allow such a procedure are called quasi-Markovian [63]. † This Markovianization theory can be formulated in the setting of near-equilibrium statistical mechanics, where one uses fluctuation-dissipation-like relations that link the decay properties of the memory kernel and the decorrelation rates of the fluctuations. Here, we follow the approach in [63] but without making any assumptions on the Hamiltonian behavior of the y-variables. We need, though, to make assumptions on the spectral properties of the generator of the y-dynamics, as explained below.
We define the generator L y 0 of the Koopman semigroup associated with the y-dynamics by: for every real-valued observable Φ ∈ C ∞ R d 2 and we denote the associated Koopman operator at time t by U t = e L y 0 t ; the subscript y has been dropped from the ∇ operator herewith, for notational clarity. Recall that the the spectrum of such operators provides useful insights into the statistical properties of the system; this topic is beyond the scope of the present paper but it is treated in detail in [19].
It suffices to show below that the spectrum of U t allows one to characterize the constitutive ingredients of the WL parameterization (2.18) and (2.23), subject to natural assumptions. Even though we have clarified in Eq. (2.8) how the Koopman operator acts on vector-valued observables of any dimension, we restrict now its action for simplicity to scalar real-valued observables, as in Eq. (2.24). In this case, along the lines of the methodology of dynamic mode decomposition [59,69,72], we can (formally) decompose the operator as: where {λ j } N j=1 are the eigenvalues that form the point spectrum of L y 0 and Π j is the spectral projector onto the eigenspace spanned by the eigenfunction ψ j . Here, R(t) is the residual operator associated with the essential spectrum of L y 0 and its norm is controlled by a decaying exponential. We assume furthermore that the spectrum of L y 0 lies in the complex left half-plane, and that in particular Reλ j ≤ 0 for any j. † Such a Markovianization procedure is actually not limited to stochastic processes and it relies on the same type of ideas in other contexts; see refs. [7,9,22]  Such a spectral decomposition and its properties can be rigorously justified for a broad class of differential equations perturbed by small noise disturbances; see [19,Theorem 1 and Appendix A.5]. Based on these rigorous results, we assume, roughly speaking, that these properties survive in a certain small-noise limit, and concentrate here on vector fields y given by G in (2.24) for which a decomposition such as (2.25) holds and a spectral gap does exist in the appropriate functional space.
In the following lines, we examine the expression of the memory kernel K appearing in Eq. (2.23) using the eigendecomposition proposed in Eq. (2.25). In particular, we study such an integral kernel K component-wise: and we have neglected the contribution coming from the essential spectrum. The (·) * superscript is used to denote the dual eigenfunction. This decomposition highlights the fact that the leading eigenvalues of the operator governing the evolution of observables in the uncoupled y-dynamics set the time scale for the memory kernel. Furthermore, this spectral approximation implies that the correlation functions of the noise have the same decay properties, as will become apparent later in the proof of Theorem 2.1. It follows that the correspondence between the noise and integral time scales allows us to recast the SIDE in the WL equation (2.23) into a fully Markovian version with linearly driven hidden variables that are forced by the observed variables, through a functional dependence that can be nonlinear. More exactly, we have the following theorem. The point spectrum of L y 0 is assumed to be constituted of N simple eigenvalues, whose corresponding eigenpairs {(λ j , ψ j ), j = 1, . . . , N } are ordered as follows: 0 ≥ Reλ j ≥ Reλ j+1 and λ j = λ j+1 when We assume that C x in (2.20) lies in the span {ψ j , j = 1, . . . , N } and has ν-mean zero. Then, the WL equation (2.23) associated with system (2.20) admits a Markovianization of the form:

30b)
where Λ and Z(t) lie in C N for every t, while the inner product Λ · Z(t) is real. Here, R is mapping R into C N , W t is a (real-valued) N -dimensional Wiener process with covariance matrix Σ, and D is an N × N matrix with complex entries, as specified below.
More precisely, The C N -valued mapping R is defined as where C y is defined in (2.20b), and · denotes averaging with respect to the invariant measure ν. Finally, D = diag (λ 1 , . . . , λ N ) and the covariance matrix Σ is given by where H is an N × N matrix whose entries are defined as follows: If λ j is real, then H j,j = 1, and if while all other entries are zero.
The full proof appears in Appendix A.
Remark 2.1. Note that the Koopman operator of interest here is the one associated with the y-subspace Y ⊆ R d 2 and not with the entire (x, y)-space X × Y ⊆ R d 1 +d 2 . Other techniques, like the DMD mentioned in Sect. 1.1, aim at extracting the modes of variability of the full system by means of studying the Koopman operator in the entire phase space through suitably defined observables. To this end, the latter methods employ projections of observables onto the eigenfunctions of the Koopman operator to obtain the so-called Koopman modes, which are susceptible of capturing the underlying dynamics. Notice that in Theorem 2.1, instead, we are using the Koopman eigenfunctions to identify the closure model, while projections only come into play in the definition of the coefficients α j and β j ; see Eqs. (2.32a) and (2.32b), respectively.

Remark 2.2.
(i) Assumptions on F, R and Λ that ensure that (2.30) possesses a global random attractor -and thus a stable asymptotic behavior in the pullback sense -appear in [43, Theorem 3.1 and Corollary 3.2].
(ii) Note that Theorem 2.1 can be viewed as a generalization of other Markovianization results for GLEs that appeared in the literature; see [63]. For instance, the scalar GLE in R reduces tȯ where λ is in R n , M is a positive definite n × n matrix and K(t − s) = e M (t−s) λ · λ determines the autocorrelation of the process η(t). In this setting, Eq. (2.36) is equivalent to the following SDE: with ΣΣ * = M + M * . Theorem 2.1 allows for nonlinear dependence on x in the z-equation, and thus for memory kernels that are more complicated than in (2.36). Such a generalization is of practical importance since the process z can then have a more complex correlation dependence on the observed variable x than the one afforded by linear memory terms. Remark 2.3. When L y 0 is self-adjoint -in a suitable Hilbert space, as outlined in Appendix B, i.e., when L y 0 = L y * 0 -the eigenvalues are real and the eigenvectors are mutually orthogonal. Self-adjointness thus implies that there are no oscillations in the correlation functions of the noise or, equivalently, peaks in their power spectrum. With respect to Theorem 2.1, the matrix H in this case would be the identity, since the eigenvalues and eigenfunctions are real and the Itô solutions of 2.30b are, hence, real as well.
Remark 2.4. The resulting system given by Eq. (2.30) is now fully Markovian and the only sources of error with respect to the original SIDE (2.18) lie (i) in the effects of the essential spectrum, which are neglected herein; and (ii) the assumptions about the coupling terms. Neglecting the essential spectrum is only valid for Koopman operators with a point spectrum capable of capturing the correlations in the decoupled y-system; the latter might only hold in the case of Markovian diffusion processes and not for deterministic ones. Also, the assumption that the coupling functions project solely on the point spectrum might not hold in general.
From a practical perspective, though, a suitable choice of dominant eigendirections can reduce the number of extra dimensions needed to integrate the system. Such a suitable choice boils down to neglecting particular eigendirections and this can be done according to two handy criteria: (i) The weight determined by the α j and β j coefficients defined in Eqs. (2.32a, 2.32b) is small; and (ii) The eigenvalues λ j of L y 0 satisfy Reλ j 0 λ † , for some j 0 ∈ {1, . . . , N }, in which case e λ j 0 t decays rapidly as t grows; here λ † < 0 and |λ † | is some characteristic inverse time for the deterministic system F. In addition, if α j 0 > α j for j = 1, . . . , N and j = j 0 , both the memory and the noise correlations die out fast. Hence, one can neglect the integral terms and perform a fully Markovian parametrization, which is possible in the presence of white noise. Remark 2.5. Theorem 2.1 is stated for x scalar for the sake of simplicity, but this result extends to the d 1 -dimensional Eq. (2.20) for the (observed) variables x. In this Remark, we sketch the main elements that permit such a generalization.
Aside from the obvious generalization of the assumptions in Theorem 2.1 to a multidimensional setting, the main hypothesis consists of assuming that the now vector-valued coupling function C x in Eq. (2.20) has components {[C x ] i : i = 1, . . . , d 1 } that project onto the N simple eigenspaces of the decoupled Koopman operator introduced in Eq. (2.24). In this case, the construction of a multilevel Markovianization like Eq. (2.30) can be done in the following fashion: where D j and Σ j are d 1 ×d 1 diagonal matrices with every (non-zero) element being equal to λ j or −2Reλ j , respectively. More importantly, the vectors Z(t) and R(x) are split into N column vectors z j (t) and r j (x) of length d 1 with j in {1, . . . , N }. This way, Z(t) = z 1 (t), . . . , z N (t) and R(x) = r 1 (t), . . . , r N (t) .
Therefore Eq. (2.38) can be written as: is a d 1 -dimensional Wiener process. The vectors r j (x) are given by: where γ i,j are defined in terms of the parameters (α j , β j ) introduced in Eqs. (2.32a, 2.32b), respectively.
Here we don't give the explicit expression of Λ, but its role is to provide suitable weights, in the spirit of Eq. Second, memory equations contain nonlocal terms that are cumbersome and computationally expensive to integrate, as well as requiring much larger storage for the full history of the system's variables. The efficient Markovianization of evolution equations with memory terms is an active field of research in diverse areas of mathematics and the applied sciences; these areas include the study of bifurcations of delay differential equations [8,11], the reduction of stochastic partial differential equations to stochastic invariant manifolds [14,15], and material sciences [22], among many others.

Preliminary Example
As seen earlier in Theorem 2.1, if the coupling function is resonant with the Koopman operator associated with the y-dynamics, one can identify the dominant exponential rates of decay of the memory term and the characteristic decorrelation time of the noise. As a consequence, one can Markovianize the parametrization and greatly facilitate the numerical integrations involved.
To illustrate the above statement, we revisit here the preliminary application of the WL parametrization in the context of multiscale triads [87]. In that work, the authors implemented the parametrization for a collection of three-dimensional models that do exhibit time scale separation and compare here the corresponding outputs to those obtained via homogenization. The results are encouraging, since the parametrizations in [87] were obtained only from the decoupled hidden dynamics, in the lines of the present paper as well; see derivation of Eq. (2.18).
One of the first multiscale triads studied in [87] is the following: Here we require that j B (j) = 0, dW (1) t and dW (2) t are scalar Brownian increments and the parameter indicates both the time scale separation and the coupling strength. Notice that when the system is decoupled, i.e. when = 0, the fast dynamics evolve according to an Ornstein-Uhlenbeck (OU) process whose steadystate statistics are governed by Gaussian distributions with explicit mean and variance (see, e.g. [63]). Hence, by virtue of the previous formulas or by following [87], the WL parametrization yields the following scalar SIDE:ẋ here, where the angular brackets refer to the ensemble averages according to the already mentioned Gaussian distributions arising from the decoupled model. Expanding these averages Eq. (2.44c) leads to: The time scales are indicated by the exponents in the formulas above and they are the same for the noise and the memory kernel. This equality suggests the possibility of Markovianizing the memory equation into the following two-dimensional system: Clearly, performing a numerical integration of this system is easier than for a memory equation like Eq. (2.43).
The results of Sec. 2.3 allow us to carry out the dimension reduction of the multiscale triad by analyzing the spectral properties of the Koopman operator associated with the decoupled y-dynamics. Since the yvariables evolve stochastically, the Koopman operator becomes the backward-Kolmogorov equation, which governs the evolution of the expectation values of the observables. Thus, for a generic observable Ψ in the y phase space, the evolution of its expectation value is given by: ∂ t Ψ(y 1 , y 2 ) = L y 0 Ψ(y 1 , y 2 ) = −γ 1 y 1 −γ 2 y 2 · ∇Ψ(y 1 , y 2 ) + σ 2 1 ∂ 2 y 1 Ψ(y 1 , y 2 ) + σ 2 2 ∂ 2 y 2 Ψ(y 1 , y 2 ). (2.47) Now, let Ψ(y 1 , y 2 ) = y 1 y 2 be the coupling function of the triad system (2.42), for which we find that: The above equation is an eigenvalue problem, showing that this particular Ψ is an eigenfunction of the Koopman operator associated with the eigenvalue e −(γ 1 +γ 2 ) . This is no surprise, since y 1 and y 2 are respectively the Hermite polynomial eigenfunctions of the backward-Kolmogorov equation of the scalar OU process [79]. Hence, the product y 1 y 2 is also an eigenfunction of the same equation for the joint process. Therefore, we can immediately re-Markovianize the parametrization according to Eqs. (2.30), where D = γ 1 + γ 2 and  [43], MSMs arise in a variety of data-driven protocols for model reduction that typically use successive regressions from partial observations; see Sec. 3.2 below. The general form of an MSM is given by [43, Eq. (MSM)]; we only use herein its most basic version, which has the following structure: Here the observed vector variable x(t) lies in R d 1 and, for = 0, the hidden variables r(t) ∈ R d 2 evolve in time independently. Otherwise, the dynamics of the x-variables is linearly coupled to that of the r-variables, which act upon (3.1a) as a stochastic forcing, via the canonical projection Π : The matrix C in R d 2 ×d 1 models the feedback of the x-process onto the r-variables. In the case of C ≡ 0, r whould evolve according to an OU process with drift matrix D and covariance matrix ΣΣ * . For the sake of simplicity, we restrict ourselves to the case d 1 = d 2 so that the projection Π reduces to the identity.
The more general MSM with nonlinear coupling considered in [43] was shown to be equivalent to a SIDE with explicit expressions for the memory kernels and stochastic forcing being obtained; see [43,Proposition 3.3]. The noise term there results from successive convolutions of the homogeneous solutions of the lower levels of the system with an OU process. In particular, using the Itô stochastic calculus, one readily obtains a SIDE that is equivalent to an MSM; see [43,Sec. 3.2] and Appendix C below.
We show next that the same SIDE can actually be obtained by using the operator formalism presented in Sec. 2 above. One might object that an MSM is a stochastic system, due to the presence of white noise in the hidden layer, whereas the theory presented above applies to deterministic dynamics. However, as clarified below, the operator formalism applies equally well to the MSM case.
In fact, given a smooth, C ∞ observable its expected value along a stochastic trajectory X t = (x(t), r(t)) solving Eq. (3.1), namely E(Φ(X t )), defines a Markov semigroup P t by which solves the backward Kolmogorov equation [19] associated with Eq. (3.1): the only difference with respect to the transport equation (2.2) lies in the presence of a second-order differential operator induced by the white noise. We introduce the operators: (3.5a) which play a role that is analogous to their deterministic relatives in Eq. (2.3) of the previous section. Again, the operator L 1 is viewed as a perturbation to the operator L 0 due to the coupling. If one considers observables Φ = Φ(x), Eq. (3.4) becomes at time t = 0: and we apply now, as in Sec. 2, the Dyson perturbative expansion. By virtue of the formula (2.18), the parametrization leads to a reduced equation of the form: where the hidden variables in the decoupled regime are governed by an OU process with invariant measure µ r . The properties of the stochastic noise η(t) are given by: η(t)η (0) = dµ r (r 0 )e tL 0 r 0 r 0 (3.8a) = dµ r (r 0 )E (r(t)|r 0 ) (E (r(0)|r 0 )) (3.8b) = dµ r (r 0 )e −tD r 0 r 0 (3.8c) = dµ r (r 0 )e −tD r 0 r 0 = e −tD ΣΣ * , where r is a function analogous to the coupling function C x y in the previous section and the initial condition r 0 is assumed to be normally distributed with zero mean and variance ΣΣ * . The memory kernel is given by:

9c)
= e −sD Cx(t − s). (3.9d) Using the intermediate steps above, the explicit parametrization becomes, finally: x(t) = F(x(t)) + η(t) + 2 t 0 e −sD Cx(t − s)ds. (3.10) The integro-differential equation above is the same as Eq. (C.2) one obtains using the Itô integration described in Appendix C. This similarity of results occurs because we are considering the case of additive coupling, and the Dyson expansion can be truncated after the memory term proportional to 2 , cf. Sect. 2.2.

Empirical Model Reduction (EMR)
As discussed in Sects. 1 and 2, and illustrated in Fig. 1, the evolution of the resolved variables is forced by fluctuating terms and the effects of the previous state of the system. It is desirable, therefore, to construct a full model of the system even when only capable to partially observe it. The EMR methodology [43,45,48,49] aims at achieving this goal; we discuss it below in the broader context of MSMs. Note that EMR provides a solution for the dynamical closure of partially observed systems and thus it differs from the methodology recently proposed in ref. [6], which requires one to fully observe the system for the data-driven discovery of its underlying equations to work.
Having a set of reduced d 1 -dimensional observations {x i : i = 1, . . . , n} every dt time units, one seeks to regress the tendencies {dx i : i = 1, . . . , n} of the data onto a quadratic function of the form: where b in R d 1 ×d 1 describes dissipative processes and Q is a quadratic form describing self-interaction between the x variables. The ith component of the quadratic form is given by: where A i in R d 1 ×d 1 . The function F is expected to approximate the vector field driving the dynamics in the absence of hidden external influences. Of course, performing regressions yields an error called residual {r i : i = 1, . . . , n}. Hence, the evolution of the x-variables satisfies the equation: At this point, one can study the properties of the residual time series {r i } n i=1 and construct a model able to reproduce its main statistical features. However, we know that if it is possible to sample all the variables of the dynamical system of interest, one expects that the residuals are explained by the errors committed exclusively in the regression algorithm. If some sort of subsampling is done, whether spatial or temporal, the residuals are due also to the delayed influence of unresolved processes that are involved in the coupling.
Allowing for the main level variables x to be linearly coupled with the residual r, we are creating a model that is able to incorporate memory effects as well. Hence, for each component i of x we have: where we have introduced new matrices b (j) ∈ R d 1 ×(j+2)d 1 that model the linear coupling. The residual at the last level [r (l+1) ] i is assumed to obey a Wiener process for which the correlation matrix is obtained from the last residual time series. The choice of stochastic process in the last step can only be done if the decorrelation of [r (l+1) ] i is sufficiently fast according to the time scale set by dt. This motivates the problem of choosing the optimal number l of levels. Several criteria have been established to determine the optimal number of levels l. The basic idea is that the resulting (l + 1)-residual in Eq. (3.14e) should be well approximated by Gaussian white noise [43,49]. One has, therefore, to test whether the residual variables decorrelate at lag dt and whether the lag-0 covariance matrix is invariant in the last levels.
Therefore, regression on the tendency of the optimal level r (l) should yield: where γ (l) is the residual of the previous regression and is approximately equal to r (l+1) . Hence, γ (l) would become a lagged version of r (l+1) . Subject to this assumption, it is possible to estimate the optimal value of the coefficient of determination R 2 : (3. 16) This means that, when the amount of unexplained variance of the last regression is 50%, one has reached the optimal number of levels. It is worth stressing that the empirical model (3.14) has the structure (3.1) of an MSM, as discussed in [43]. It can, therewith, be integrated to transform it into an integro-differential equation with explicit formulas for the fluctuating noise and memory kernel, cf. [43, Proposition 3.3]; see also Sec. 3.1 for such a transformation from another perspective. Finally, note that the aforementioned stopping criterion for EMR -namely R 2 0.5, see [43, Appendix A]) -is based on decorrelation times and it is also present in the multilevel WL equation (2.40). We noted, in fact, in Remark 2.5 of Sect. 2.3 that the last level modeled by Eq. (2.40c) decorrelates the fastest with respect to the rest. Ultimately, making this points amounts to saying that a low number of levels is expected to arise in the EMR method, provided most of the eigenvalues λ j in Theorem 2.1 are located far away from the imaginary axis, except for a very few of them. Conversely, if the Koopman eigenvalues cluster near the imaginary axis or do not exhibit a spectral gap located at a, suitably defined, small negative real part, many levels are expected to be needed to capture the hidden dynamics; see again Remark 2.5.

Numerical Experiments
In Sects. 2 and 3, we have shown that both the WL top-down approach and the EMR data-driven method yield a set of multilevel equations for the variables of interest in a multi-scale system. In particular, both approaches give explicit formulas for the fluctuation term and memory kernel in the GLE (1.3b) of the Mori-Zwanzig formalism. Furthermore, their Markovian representation share that the hidden layers are linearly driven with a white noise background (see Eq. 2.30 and Eq. (3.1)). We now compare the two approaches to model reduction in a simple, conceptual stochastic climate model.
Since the modeling of geophysical flows is the primary motivation for this research, we consider a set of SDEs proposed in [27, among others] as a physically consistent climate "toy" model. In such a model, the main x-variables are slow and weakly coupled to the fast y-variables. The latter correspond to weather fluctuations and carry, in fact, most of the system's variance. The model's governing equations are: where W

WL Approximation
Notice that the hidden variables evolve according to a decoupled OU process. Taking advantage of this fact, we calculate the weak-coupling-limit parametrization of the model, according to the formulas presented in Sect. 2 for the separable coupling functions given by: The coupling function C x in the slow equation (4.2a) is independent of the x-variables, indicating that the noise correction can be additively incorporated and implemented by examining the decoupled hidden process. Note that the functional form of C y implies that the WL parametrization cannot be exact in , as noted in Eqs. (2.11) and (2.12). This indicates that the WL reduced model will not only introduce an error in averaging over the decoupled steady state, but also that the Dyson expansion Eq. (2.5) has to be truncated at 3 , rather than merely at 2 where no memory effects would be included. According to the WL parametrization discussed in Sect. 2, the fluctuation terms correspond to the decoupled evolution of the coupling function C x y , concretely as in Eq. (2.16). This allows to directly compute the correlation function: The memory kernel K, which is a vector of two components (κ 1 , κ 2 ) , is given by: here the brackets · indicate the averages for the uncoupled equilibrium in the y-variables, which happen to be a set of independent OU processes. Explicitly, The reduced-order model obtained herewith does give explicit formulas for the evaluation of the stochastic noise and the memory kernel, independently of the time scale separation h, although these formulas are rather complicated. Still, the scheme remains the same when changing parameter values, so it is flexible in studying different scenarios.

EMR Model and Results
Basic EMR algorithm implementation. Regarding the data-driven EMR protocol, we integrated the full model with a time step of d t = 10 −3 time units for a duration of T = 10 4 time units in order to learn the model parameters. Then, a separate run was performed in order to examine the ability of the inferred model to reproduce the general statistical features. This time, the EMR system was integrated together with the full model using a time step of d τ t = 10 −2 time units for a total of T τ = 10 5 time units. The equations were solved using a fourth-order Runge-Kutta and a Euler-Maruyama method for the deterministic and stochastic components, respectively. By sampling every time step, we learned an EMR model whose coefficients were explicitly found. The convergence criterion R 2 0.5 was attained by adding two extra levels., for a total of three. The convergence was not affected by changes in the time scale separation parameter -namely, h = 0.1 and h = 1 in the case at hand. Probably the value of h was not that important here because of the low dimensionality and stochastic nature of the hidden process. However, convergence is likely to be altered in more complicated models, as illustrated in Appendix D.
The climatologies of the slow x-variables are obtained using data from the full model, the EMR model and the WL parametrization. The two-dimensional probability density functions (PDFs) of the stochastic model (4.1) in the (x 1 , x 2 )-plane are shown in Fig. 2. These PDFs were calculated by employing the Matlab R2019a kernel smoothing function ksdensity. Their respective marginals are shown in Fig. 3. The agreement between the two methodologies when approximating the clearly non-Gaussian density arising from the full model is clearly excellent.
The time scale separation between the x-variables and the y-variables is clearly depicted in the left panel of Fig. 4, where the fast variables decorrelate almost instantly compared to the slow ones. The approximation of these autocorrelation functions is also obtained using the EMR and WL methods.  In general, cf. [48], the regressions performed in the main level (3.14a) of the EMR model allow one to effectively reconstruct the coefficients of a weakly coupled model; see Appendix D. The EMR methodology, though, only allows for linear coupling between the slow x's and the fast y's. The nonlinear coupling between the slow and fast variables in system (4.1) compromises the estimation of the main model parameters in Eq. (3.14a), so that we cannot expect to recover the original, full model's behavior given by (4.1). The EMR model coefficients at the first and second levels are as shown on tables 1 and 2, respectively.    (1) , the next two columns indicate the linear coupling to the main level (i.e. the first two columns of b (1) ) and the last two columns determine the linear drift for the second level (i.e. the last two columns of b (1) ).
f (1) x 1 x 2 r As discussed in Sect. 3.2, the EMR has the structure of an MSM and it can be recast into an integrodifferential equation. If one only considers the first added level, the EMR can be readily integrated giving the following equation for the evolution of the slow variables x = (x 1 , x 2 ): First thing to note is that the matrix C has a small norm and, by virtue of Eq. (4.7), it means that memory effects are going to be very small. On the other hand, the eigenvalues of the matrix D are λ 1 −20, λ 2 −10, which are approximatelly the drift coefficients of the uncoupled OU process driving the y-variables. This indicates that the exponential kernel is damping the effects of the x-variables in past times rather quickly. Moreover, the covariance matrix Σ corresponds to that obtained by integrating the y-dynamics independently, according to Eq. (4.4).
Regarding the WL approximation, we stress that the y-variables are no longer present, after taking the averages in its construction. The memory kernel K in this case differs from the matrix D above, although its dominant terms correspond to its eigenvalues. Note that, if L 13 = 0, the coupling function C x y would project entirely onto the eigenfunction of the OU process associated with the eigenvalue −(γ 1 + γ 2 ). The same statement would hold for c 134 = 0, where in this case C x y projects onto the eigenfunctions associated with the eigenvalue −γ 1 .
Reduced Time Scale Separation. The parameter h controls the time scale separation in the evolution of the xand y-variables. Here we set h = 1 so that this separation is reduced by an order of magnitude as illustrated by the autocorrelation functions in Fig. 7. The question to be addressed in this subsection is the effect of such a reduction in the WL and EMR parametrizations and their respective performance.
In the WL parametrization, there is no need to sample the dynamics in order to construct it, since the formulas of Sect. 2 are explicit and do not depend on h. In the case of model (4.1), the covariance matrix and time correlations of the WL noise correction are thus given by Eq. (4.3) with no reference to h. The memory term, though, is expected to change as the kernel K will decay more slowly by a factor of 10. Therefore, memory effects are more important, as expected.
The EMR approach, on the other hand, requires a new learning phase for this value of h = 1. We used the same numerical integration parameters d t = 10 −3 and T = 10 4 time units as for the previous case. In the first level regression, one observes that the coefficient values listed in Table 3 are essentially the same from those estimated in the previous case, for h = 0.1, and listed in Table 1, as well as being even more distant from the original ones in the table's first row. The second level coefficients, for h = 1, are shown on Table 4.
The covariance matrix Σ of the noise correction is indicated in Eq. (4.9a) Eq. (4.9b) and it agrees fairly well with the previous values, for h = 0.1, as given in Eq. (4.8b). The matrix C that indicates the strength of the memory effects also has a magnitude that is of the same order as that in the previous case of h = 0.1, which is rather suprising, given the factor of 10 in time scale separation h; compare Eqs. (4.8a) and (4.9a). This observation tells us that the loss of Markovianity might be intrinsic to the nature of the coupling rather than being due to the time scale separation, even though, in the limit case of infinite scale separation, memory effects will dissapear entirely. The memory kernel, as determined by D, scales almost exactly with the time scale separation and it is expected to change depending on how the coupling functions project onto the eigenspaces of the underlying Orstein-Uhlenbeck process, as discussed more generally earlier in Theorem 2.1.
The performance of both parametrization techniques is summarized in Figs

Memory Effects
We would like to end this results section by analyzing the role of the memory effects when performing a reduction of the highly idealized model given by Eqs. The difference between the two τ -values is due to the fact that we expect the coarse-grained phase space to be sensitive to the system's variability. Hence, if the time scale separation is large, a shorter transition time is requiered in order to capture the influence of the hidden processes. In fact, a range of transition times was tested in the case of h = 1 to find that the optimal value was τ = 1. It follows that the methodology is robust in showing the effects of memory in the projected phase space.
We clearly observe in Figure 8 that the correlation functions can be accurately reconstructed in the case of large time scale separation h = 0.1 (figure (a)) but not so for h = 1 (figure (b)). This indicates, naturally, that memory effects are negligible in the first case and relevant in the second.

Conclusions
To formulate accurate and efficient parametrizations for multiscale processes is a crucial challenge in many areas of science and technology for one of two reasons: either the numerical simulation of all scales active in a given system is computationally unfeasible; or there is a mismatch between model resolution and the granularity and homogeneity of the observations, as in the case in geophysical flows and in the climate system. Moreover, the construction of parametrizations is instrumental to help understand the nature of nonlinear fluxes across scales and the physical processes responsible for cascades, instabilities, and feedbacks. There are two main approaches for constructing parametrizations: top-down, by deriving the parametrizations directly from the evolution equations governing the system through the use of suitable approximations; and data-driven, in which the parametrizations are constructed through suitable optimization procedures, which are first tuned in a training phase and then actually used in the prediction phase. Both approaches aim to derive the effective dynamics for the variables of interest: formally, this is achieved by applying the Mori-Zwanzig projection operator [60,90] to the full dynamics. The result of doing so is to describe the impact of the hidden variables by formulating a generalized Langevin equation (GLE) (1.3b) for the variables of interest that includes a deterministic, a stochastic, and a non-Markovian component.
Top-down and data-driven approaches are conceptually complemetary and have different practical advantages and disadvantages. In this paper, we have shown the fundamental equivalence between a top-down and a data-driven approach that have been formulated and applied in the recent literature. This equivalence was illustrated schematically in Fig. 1.
We first revisited in Sect. 2 the WL parametrization of [88,89], which relies on an assumption of weak coupling between the hidden and observed variables, and have extended the previous results by considering more general coupling classes. We have also shown that the perturbative expansion that yields the WL parametrization is exact when the coupling between the hidden and resolved variables is additive.
The Dyson formalism (2.12) appears to be essential for computing the effects of the hidden processes on the dynamics of the observed variables, when working at the level of the system's observables. This methodology is explicit, in the sense that no information about the actual coupled process is needed, because the formal computations are performed by considering the limit in which no coupling is present. Other advantages of this approach are that it can be implemented without the need for any hypothesis on the time scale separation between the hidden variables and the observed ones, and that it is also scale adaptive [83].
We addressed systematically the problem of re-Markovianizing the WL memory equation, which was first pointed out in [87] and discussed further in Sect. 2.4 here. In this example, a system (2.42) with one observed and two hidden variables that yielded a scalar WL parametrization was re-Markovianized to a Markovian system with just two scalar differential equations. Throughout Sect. 2, we provided a broader framework for re-Markovianization. This framework was presented for a scalar equation in Theorem 2.1 and described for higher dimensional systems in Remark 2.5. The required assumptions for this treatment boil down to certain spectral properties of the Koopman operator for the hidden variables.
The multilevel structure of the re-Markovianization obtained in Sect. 2 motivated the comparison with multilayer stochastic models (MSMs) in Sect. 3. Such MSMs arise naturally in data-driven reduction methods and they had been shown in [43] to approximate the GLE predicted by Mori [60] and Zwanzig [91].
We showed in Sect. 3.1 that a seemless application of the WL parametrization solves the MSM of Eq. (3.1) and coincides with its Itô integration, cf. [43]. Note that an MSM can be obtained from partial observations of the coupled system, which amounts to the special case of the data-driven empirical model reduction (EMR) methodology [43,45,46,48,49].
The EMR methodology was revisited here in Sect. 3.2 and it is, in principle, dual to the WL parametrization, in the sense that only partial observations of the coupled system are required, without the need for knowing the actual equations of motion. Comparing the multilevel structure of Eq. (2.40) with that of Eq. (3.14) suggests that the Koopman eigenvalues λ j highlighted in Theorem 2.1 may help provide insights into the number of levels needed for EMR to converge. This practical role of the λ j 's deserves, therewith, a more careful examination in further work.
Additionally, we considered in Sect. 4 a conceptual climate model to which we applied both of the methodologies revisited herein. Since both the MSM and the WL parametrization yield a memory equation that involves integrals and stochastic noise, we were able to compare their structure, as well as their statistical outputs. We found that both methodologies produced equivalent numerical results and that the memory kernel and noise predicted in the WL parametrization agreed with what was found using the data-driven EMR approach.
Concluding, our view point is complementary to the dynamic mode decomposition [59,69,72] as it uses the basis of eigenvectors of the Koopman operator to construct the projected -in the sense of Mori-Zwanzig -dynamics of the observables of interest, which is then recast in Markovian form using the multilevel Markovian model framework, where the number of levels corresponds to the number of eigenvectors of the Koopman operators one considers in the reconstruction of the dynamics. In a nutshell, our findings support, on the one hand, the physical basis and robustness of the EMR methodology and, on the other hand, illustrate the practical relevance of the WL perturbative expansion used for deriving the parametrizations.

A Proof of Theorem 2.1
Proof. The aim is to show that under the assumptions of this Theorem -which require that the coupling function C x : R −→ R projects entirely onto span {ψ j , j = 1, . . . , N } -the memory and noise terms of the WL equation (2.23) are obtained from the term Λ · Z(t) in Eq. (2.30a), after integration of Eq. (2.30b).
Step 1. In this step, we expand the memory term and the lag-correlations of the noise in the WL equation (2.23) in terms of the leading eigenelements of the uncoupled Koopman operator L y 0 . These expansions will serve us in Step 2 below to compare the noise and memory terms of the WL-equation with those produced after integration of Eq. (2.30b).
Let the λ j be sorted as in the hypotheses of the theorem. Hence, to distinguish real and complex conjugate eigenvalues and for notational convenience, we introduce the set of indices I r and I + defined as: An immediate consequence that is used several times throughout the proof is that the sum of the eigenvalues is real, and may be split as follows: As previously stated, we expand the mean and correlation functions of the scalar noise term η in the WLequation (2.23) in terms of the eigenpairs. The mean is zero by assumption, but the autocorrelation function can be expanded as follows, based on Eq. (B.2), Regarding the complex scalars α j and β j defined (2.32a) and (2.32b), it follows that for each j such that λ j = λ j+1 , we get α j = α j+1 and β j = β j+1 . Indeed, in which we have exploited the fact that ψ j = ψ j+1 when λ j is complex and j in {1, . . . , N }. The same proof can be repeated for β j . Such a conjugacy relation also holds for the gradients of the eigenfunctions ∇ψ j , for those j in {1 . . . , N } such that λ j = λ j+1 . This is observed by the following equality: since ∇ is a differential operator that only involves here differentiation with respect to a real variable.
Exploiting (A.5), the memory kernel K then expands as (recalling that R(t) = 0): which leads to Re e λ j (t−s) α j ∇ψ j (y) . Step2. The second step consists of analyzing the noise and memory terms produced by integration of Eq. (2.30b) and to compare these terms with those of the WL-equation.
Performing an Itô integration of Eq. (2.30b) leads to: Let Λ be defined as in (2.31) and let us calculate the mean and lag-correlations of the one-dimensional stochastic process Λ · q, aimed at approximating η in the WL equation. First, note that Λ · q is a zero-mean Gaussian process, with lag correlations given by: Now, expanding e Dt Λ · Λ in (A.10a) shows that we recover the right-hand side (RHS) of (A.3). However, the noise term η in the WL-equation is a real-valued stochastic process, and we are dealing with complex scalars, so therefore we still have to show that Λ · q(t) is real for every t in R. To do so, let us denote by w = [w 1 , . . . , w N ] any arbitrary row vector in R N . In other words, w is an arbitrary column vector with real entries.
Consider the following inner product: By construction of the matrix H in Eq. (2.34)-(2.35), the product Hw is given component-wise, for j = 2, . . . , N , as: while [Hw] 1 = w 1 . This implies in particular that [Hw] j = [Hw] j+1 whenever λ j = λ j+1 . As a consequence, we get which shows that Λ · e Dt Σ is a real-valued quantity, and thus for any realization of the N -dimensional Wiener process W t , the product Λ · e D(t−s) ΣdW s is real and hence, Λ · q(t) is also real for every t. Finally, we are left with showing that the memory kernel K of the WL-equation coincides with that of the memory term in Eq. (A.8), when multiplied by the vector Λ. To do so, we exploit the expansion (A.7) of K for this comparison, namely using the expression of R in (2.33), we observe that Λ · t 0 e D(t−s) R(x(s))ds = C y (x(s)) · j∈Ir e λ j (t−s) α j ∇ψ j (y) Re e λ j (t−s) α j ∇ψ j (y) , (A.14) which indeed coincides with the expression of K, as desired. The proof is complete.

B Semigroup Property of the Projected Koopman Operator Family
It was shown in [16, Theorem A] that projection onto a reduced state space is closely related with a coarse graining of the (full) probability transitions on the original system's attractor, while [19,Theorem 2] dealt recently with the impact of such a projection in terms of reduction of the Koopman semigroup. In [19], the authors proposed a criterion based on the spectral theory of Markov semigroups to ascertain whether the reduced state space associated with a given projection can fully explain the statistics of the desired variables. This approach provides potential insights into the need for modeling non-Markovian effects by inspecting the loss of the semigroup property, as explained below.
Moreover, it follows from [19] that the analysis of correlation functions is not only of physical interest but also of methodological utility. Correlation functions can be defined by means of the Koopman operator or, dually, by means of the transfer operator's providing the solution of the Liouville equation.
Let µ denote an ergodic invariant measure of the system and take two observables Φ 1 , Φ 2 in the space L 2 µ of zero-mean functions that are square-integrable with respect to µ. Assume furthermore that the spectrum of the operator L in L 2 µ is a pure point spectrum, given by the eigenvalues {λ j } ∞ j=1 and their associated eigenfunctions {ψ j } ∞ j=1 , where the eigenvalues are ordered by their decreasing real parts. Then, the correlation function C Φ 1 ,Φ 2 (t) between the functions Φ 1 and Φ 2 is given by: and it can be expanded, formally, as The dual operators in (B.1) and the adjoint eigenvectors in (B.2) are indicated by the superscript (·) * , while ·, · µ denotes the inner product. We refer to [19, Corollary 1] for a proof of (B.2) in the context of Markov semigroups. The proof actually applies to the case of the Koopman semigroups considered here as long as the Koopman semigroup U t defined by (2.4) is a strongly continuous semigroup in L 2 µ . The RHS of Eq. (B.2) consists of a linear combination of exponential terms whose coefficients are calculated by projecting Φ 1 and Φ 2 onto the corresponding eigenspaces. These coefficients weight each exponential function and they can become exceedingly large if the Koopman operator deviates very much from normality [77]. Note also that the set of eigenvalues λ j 's play a key role in defining the response of the system to perturbations [55,75].
The interactions between the resolved and hidden variables that are modeled by the Dyson expansion of the Koopman operator in Sec. 1.1 may introduce memory effects into the closed, reduced model for the x-variables, as given by Eqs. (2.18)-(2.22). In certain situations, such memory effects can be neglected, even in the absence of exact slaving relationships between the resolved and hidden variables [13]. But the loss of slaving relationships requires, in general, an explicit representation of memory effects [12] to achieve an efficient model reduction.
Furthermore it was shown in [16,19] that the reduction of the Koopman semigroup to observables that act only on the reduced state space leads, in most circumstances, to a family of operators that, while Markovian, no longer satisfy the semigroup property. One might then ask to which extent this loss of the semigroup property arising from the reduction, and the related emergence of memory effects, are crucial for providing a faithful reduced model of the observed variables.
When considering reduced state spaces obtained by projection, along with observables Φ 1 and Φ 2 defined on them, Theorem 2 in [19] shows the existence of a family of Markov operators {T t } t≥0 that satisfies: for every t ≥ 0, where π x is the canonical projection onto the reduced subspace and µ x is the disintegrated or sample measure associated with π x ; see [19,Remark 3]. However, due to the projection, the semigroup property is lost, namely, T s T t = T t+s for some t, s. Following the reasoning above, one can establish a criterion for the need to model a memory contribution when performing the model reduction. Formally, if there exist τ > 0 and T ∈ N such that, for every t ∈ {kτ ∈ R : 0 ≤ k ≤ T }, we have and one can say that the semigroup is preserved, to some extent, depending on how large T can be in Eq. (B.4) above. Other such criteria are available in the context of mutually dual Koopman and transfer operators. Thus, A. Tantet and coauthors [76] had already considered empirical ways of quantifying the loss of the semigroup property in reduced dimensions. The interpretation of τ comes from the practical implementation of the methodology and it is usually referred to as the transition time. Indeed, numerically, the approximation of such Markov operators is done by Ulam's method, by projecting them onto a finite basis, typically using the characteristic functions of certain domains of phase space. Then, the transitions between domains -after an adequate transition time τ -are counted to obtain matrix estimates of the operator T τ acting on the reduced phase space. Hence, one seeks a suitably small, or large [75], transition time to obtain the best candidate for applying Eq. (B.4), see also [19,Sec. 3.3]. Thus, Eq. (B.4) can be implemented in practice this way in order to (potentially) reconstruct correlation functions on the whole phase space. A very simple illustration of such a transfer operator calculation is given in [81].

C Itô Integration of the MSM
In the main text, we proposed a solution of the MSM given by Eqs. (3.1) using the Dyson expansion for the linear operators involved in the backward Kolmogorov equation. advection acting on functions. Therefore, we substituted nonlinear ordinary differential equations for a partial differential equation, for the sake of having linear operators in hand. The same solution can be attained by direct integration of the MSM in the form (3.1). We convolute, in the Itô sense, Eq. (3.1b) to find an explicit solution for r(t) when d 1 = d 2 : in which the memory effects in the fourth term are of second order in . Note that the -order terms arise from a noise realization in the decoupled regime, whereas the memory term is exclusively due to the coupling of the main variables with the hidden ones. Hence, the only degree of freedom left is the distribution of the initial states r(0).

D The coupled L84-L63 System
The EMR methodology's ability to capture the statistics of low-dimensional dynamical systems was illustrated in [48], where the authors considered the L63 system [53] as a test case in which the phase space can be fully sampled. Moreover, provided that the integration time step is short enough, the parameters of the underlying model can be fully captured with a high degree of confidence.
Here, we repeat the analysis of [48] to illustrate the effectiveness of EMR in capturing statistical and dynamical properties in an extended system. The model we consider is the result of coupling the X = (X, Y, Z) variables of the L84 system [54] with the x = (x, y, z) variables of the L63 system [53], namely: the parameter values are: a = 0.25, b = 4, F 0 = 8, G = 1, and s = 10, ρ = 28, β = 8/3, respectively. The parameter h measures the strength of the coupling, while τ scales the rate of change in the L63 system and, therewith, the time scale ratio between the two subsystems. This system is a skew-product, in the sense of [74], since the coupling is one-way only, with the L63 system driving the L84 dynamics. Hence, one has -as noted in [82] -a fully Markovian parametrization of the L63 variables. Furthermore, the correlation function that defines the stochastic noise η(t) can be further expanded and simplified,with respect to Eq. (2.16). One can, in fact, write explicitly: where the angular brackets · indicate averages with respect to the physical measure associated with the L63 system. Since L84 does not feed back into L63, the evolution of x(t) only obeys the dynamics of L63, and thus the decorrelation of the noise scales with τ . In most of the numerical experiments, the time scale separation between the two systems is τ = 5. The relevance of this time scale parameter was investigated in previous work [82]. Here, we focus on the effects of the coupling strength h, and we shall study the cases of h = 0.25 and 0.025. Partial observations only will be used in these experiments, by sampling the three-dimensional outputs of the L84 system. Then, the observed tendencies are regressed and sequentially layered following the EMR approach, as explained in Sect. 3.

D.1 EMR Outputs
The L84-L63 model is integrated for 730 time units that correspond in L84 to 10 natural years, with a time step of 5 · 10 −3 time units; two separate runs are made for the coupling strengths h = 0.25 and h = 0.025. These two full-model runs are used to train the corresponding EMR model versions, both of which use only the slow X-variables and eliminate the fast x-variables. Then, two separate full-model simulations are run, for testing purposes, over 7 300 time units, and the EMR model's output is compared with it, for the two parameter values. Below we show the main statistical outputs of the EMR methodology compared to the two reference integrations of the full model. The results for the two separate h-values are shown in Figs. 9-11 and Figs. 12-14, respectively.
The region of phase space explored by the EMR model clearly coincides with the one visited by the full model, as seen in Figs. 9 and 12, and the relative occupancies within this region-as indicated by the smoothed PDFs shown in Figs. 10 and 13, respectively-agree very well. The time scales are also well captured, as indicated by the good approximation of the autocorrelation functions, cf. Figs. 11 and 14. Notice that, while the original L84-L63 system is purely deterministic, the EMR model includes white noise acting on the hidden layers of the learned model. This fact could suggest that a smoothing of the invariant measure is inevitable and that the EMR methodology may not be able to capture fractal geometries in phase space, since the EMR model does not satisfy the Hörmander's hypoellipticity condition [18,40]. The numerical evidence in Figs. 9 and 12, however, illustrates a strikingly good approximation of the full model's attractor, including its very fine, and presumably fractal, structure.
Actually, [43, Theorem 3.1 and Corollary 3.2] provided sufficient conditions for the existence of a random attractor for a broad class of MSMs that are not subject to a non-degeneracy condition of Hörmander type.
In other words, one can have an MSM that possesses a random attractor and is thus dynamically quite stable, while exhibiting in a forward sense an invariant measure of the associated Fokker-Planck equation that is singular with respect to the Lebesgue measure. This mathematically rigorous result helps explain what is observed numerically not only in the present paper for the EMR of the L84-L63 model, but also in the case of the EMR model of the Lotka-Volterra example in [43, Fig. 7].
Ulam's method was used on the projection of the full (X, x) phase space onto the X subspace to approximate the spectrum of the Koopman operator, since it can provide further information on the characteristics of the time series, beyond PDFs and correlation functions. The observed spectra (red × symbols) using a coarse partition of phase space into 512 nonintersecting boxes showed good agreement with the spectra based on the full model (blue open circles); see Fig. 15. This agreement confirms further that, at this level of coarse graining, the EMR model captures well the the characteristics of the full model's solutions.

D.2 Convergence
Convergence in the EMR approach is determined by the "whiteness" of the last-level residual, as explained in Sect. 3.2; see Eq. (3.16) and discussion thereof. In Fig. 16, we plotted the mean of the determination coefficients R 2 for the three X-variables and we show that its convergence in the EMR approach depends only mildly on the coupling parameter h. Indeed, for h = 0.25 we observe in panel (a) that around 18 levels are necessary before achieving the optimal level, whereas for weaker coupling with h = 0.025 convergence is attained in panel (b) already with 15 levels, as one might expect. Furthermore, as already pointed out in [43, Sec. 7] on a different example, the results in Fig. 16(c) illustrate that a smaller time scale separation τ can require a higher number of levels for EMR to attain convergence: in the case at hand, around 25 levels are needed. For completeness, Fig. 16(d) shows that including additive white noise in the L63 system can, in fact, accelerate the convergence of the method, with convergence achieved at = 7.

D.3 Model Coefficients
We show here that the EMR model coefficients can be efficiently approximated when phase space subsampling is carried out. Here, regressions are performed over 50 short time series of 10 time units each, with a time step of 5 · 10 −3 , as in Sect. D.1. The reason for taking this sample length here is that 10 time units is visually enough for the slow variable X to go through a cycle, as illustrated in Fig. 17, for both h = 0.25 and 0.025. The estimated coefficients and their standard deviations using the EMR regressions are listed below in Tables 5 and 6 for h = 0.25 and in Tables 7 and 8 for h = 0.025. The tables show the coefficients of the linear and quadratic forms at first level in the EMR regressions: see Eq. (3.14a).
As expected, a stronger coupling of h = 0.25 leads to greater uncertainty in the estimation, as indicated by the corresponding standard error. For the fairly complex and chaotic system at hand, we note that no memory effects are artificially introduced in the regressions at the second level. Indeed, we found that the coupling of the main level with the subsequent ones was 0 to the fourth decimal place.