## Abstract

Graph spectral analysis can yield meaningful embeddings of graphs by providing insight into distributed features not directly accessible in nodal domain. Recent efforts in graph signal processing have proposed new decompositions—for example, based on wavelets and Slepians—that can be applied to filter signals defined on the graph. In this work, we take inspiration from these constructions to define a new guided spectral embedding that combines maximizing energy concentration with minimizing modified embedded distance for a given importance weighting of the nodes. We show that these optimization goals are intrinsically opposite, leading to a well-defined and stable spectral decomposition. The importance weighting allows us to put the focus on particular nodes and tune the trade-off between global and local effects. Following the derivation of our new optimization criterion, we exemplify the methodology on the *C. elegans* structural connectome. The results of our analyses confirm known observations on the nematode’s neural network in terms of functionality and importance of cells. Compared with Laplacian embedding, the guided approach, focused on a certain class of cells (sensory neurons, interneurons, or motoneurons), provides more biological insights, such as the distinction between somatic positions of cells, and their involvement in low- or high-order processing functions.

## INTRODUCTION

Many aspects of network science relate to graph partitioning—the grouping of nodes in subgraphs—and graph embedding—their representation in a low-dimensional space that accounts for graph topology (Von Luxburg, 2007). Spectral graph theory motivates analytical methods based on the eigenvectors of fundamental graph operators, such as the adjacency and the Laplacian operators (Chung, 1997). For instance, the well-known graph cut problem can be convexly relaxed and solved by thresholding of the Laplacian eigenvector with the smallest nonzero eigenvalue, known as the Fiedler vector (Fiedler, 1989). More recently, new approaches in graph signal processing have taken advantage of the Laplacian eigenvectors to define the graph Fourier transform, which can then be used to process (i.e., filter) graph signals in the spectral domain (Ortega, Frossard, Kovačević, Moura, & Vandergheynst, 2018; Shuman, Narang, Frossard, Ortega, & Vandergheynst, 2013); the spectral graph wavelet transform by Hammond, Vandergheynst, & Gribonval (2011) is one such example.

The Laplacian eigenvectors also provide a meaningful embedding by mapping nodes onto a line, or higher dimensional representation, that minimizes distances between connected nodes (Belkin & Niyogi, 2003). Other well-known embedding techniques use different metrics for distance in order to assess local graph properties, ranging from simple Euclidean distance in locally linear embedding (Roweis, 2000), to shortest path in Isomap (Tenenbaum, 2000), transition probability (Shen & Meyer, 2008), or conditional probability of an edge in *t*-distributed stochastic neighbor embedding (van der Maaten & Hinton, 2008). A time-dependent dynamical similarity measure has also been introduced (Schaub, Delvenne, Lambiotte, & Barahona, 2018). In addition, efforts have been made to employ global properties of the graph, such as in Sammon mapping (Sammon, 1969), where a cost function including all pairwise distances is optimized. In this manner, embedding is performed while taking in consideration both local (neighborhood) and global (distant nodes) properties of the graph. However, these techniques are not aware of the network at the mesoscale: One cannot guide the embedding by giving a certain subgraph more importance while still preserving local features and global topology characteristics.

In essence, the most powerful feature of graph spectral embedding is to effectively summarize local structure across the graph into low-dimensional global patterns. This is achieved, for instance, with the recently introduced concept of graph Slepians; that is, graph signals that are bandlimited and take into account a subset of selected nodes. Specifically, two types of Slepian designs that respectively optimize for energy concentration and modified embedded distance have been introduced (Van De Ville, 2016; Van De Ville, Demesmaeker, & Preti, 2017b).

In this work, we further build on this framework by providing a simple way to guide analyses with additional flexibility. *Guidance* includes the selection of a given subgraph or group of nodes to study, and the ability to specify the intensity of the focus set on these selected nodes. With respect to graph Slepians, we hereby provide several extensions. First, we allow the selection process to be weighted, so that the importance of a node can be gradually changed. Second, we propose a new criterion that meaningfully combines the two existing ones; that is, we want to maximize energy concentration and minimize modified embedded distance at the same time. Third, as we detail below, these two criteria are counteracting, and hence, we obtain stable solutions even at full bandwidth, where the original Slepian designs degenerate numerically. Fourth, we show how this criterion can be rewritten as an eigenvalue problem of an easy modification of the adjacency matrix, which can be interpreted as reweighting paths in the graph, and thus significantly simplifies the whole Slepian concept. The solution of the eigendecomposition then defines the guided spectral domain, spanned by its eigenvectors. We illustrate the proposed approach with a proof-of-concept on the *Caenorhabditis elegans* (*C. elegans*) connectome. Through spectral embedding-based visualization, we observe the effects of focusing on a specific cellular population made of sensory neurons, interneurons, or motoneurons, and we reveal trajectories of these neurons as a function of focus strength.

## METHODS

### Essential Graph Concepts

We consider an undirected graph with *N* nodes, labeled 1, 2, …, *N*. The edge weights are contained in the symmetric weighted adjacency matrix $A~$ with nonnegative real-valued elements *ã*_{i,j}, *i*, *j* = 1, …, *N*. We also assume that the graph contains no self-loops; that is, all diagonal elements *ã*_{i,i} are zero. The degree matrix **D** is the diagonal matrix with elements *d*_{i,i} = $\u2211j=1N$*ã*_{i,j}. The graph Laplacian is defined as $L~$ = **D** − $A~$ and can be interpreted as a second-order derivative operator on the graph. Here, we use the symmetrically normalized variants of the adjacency $A~$ and graph Laplacian $L~$ defined as **A** = **D**^{−1/2}$A~$**D**^{−1/2} and **L** = **I** − **A**. This normalization is often used in applications to emphasize the changes in topology and not in nodal degree (De Lange, De Reus, & Van den Heuvel, 2014).

*N*that associates a value with each node (Shuman et al., 2013). One way to recognize the importance of the Laplacian and its eigendecomposition is to consider the smoothness of a graph signal

**x**as

*a*

_{i,j}. The eigenvectors of

**L**minimize this distance that is reflected by the eigenvalues, sorted by convention increasingly as

*λ*

_{1}= 0 ≤

*λ*

_{2}≤ … ≤

*λ*

_{N}. Therefore, considering the eigenvectors associated with the smallest nonzero eigenvalues provides the Laplacian embedding of the nodes that minimizes distance in a lower dimensional space (Belkin & Niyogi, 2003). The eigenvector with the smallest nonzero eigenvalue is also known as the Fiedler vector (Fiedler, 1989), which relates to the solution of the convex relaxation of the graph cut problem (Von Luxburg, 2007).

**L**=

**UΛU**

^{⊤}of the graph Laplacian is the cornerstone of spectral methods for graphs, as the eigenvectors {

**u**

_{k}},

*k*= 1, …,

*N*(columns of

**U**) play the role of graph Fourier components, and the associated eigenvalues {

*λ*

_{k}},

*k*= 1, …,

*N*, of frequencies (Chung, 1997). The graph Fourier transform (GFT) then provides the link between a graph signal

**x**and its spectral coefficients given by vector $x^$:

### Graph Slepians

**S**; that is, we have

*S*

_{i,i}=

*δ*

_{i∈𝒮},

*i*= 1, …,

*N*, where

*δ*is the Kronecker delta. The Slepian design then boils down to finding the linear combination of Laplacian eigenvectors, encoded by spectral coefficients $g^$, within the bandlimit

*W*with maximal energy in 𝒮, reverting to the Rayleigh quotient

**W**is a spectral selection matrix that has

*W*ones on its diagonal followed by

*N*−

*W*zeros. This problem can be solved by the eigendecomposition of the concentration matrix

**C**=

**W**

^{⊤}

**U**

^{⊤}

**SUW**as

**C**

**ĝ**

_{k}=

*μ*

_{k}

**ĝ**

_{k},

*k*= 1, …,

*W*. The graph Slepians

**g**

_{k}=

**U**

**ĝ**

_{k},

*k*= 1, …,

*W*, are orthonormal over the entire graph as well as orthogonal over the subset 𝒮; that is, we have $gk\u22a4$

**g**

_{l}=

*δ*

_{k−l}as well as $gk\u22a4$

**Sg**

_{l}=

*μ*

_{k}

*δ*

_{k−l}.

The Laplacian embedded distance **x**^{⊤}**Lx** is a measure of smoothness of the vector **x** over the graph, which is why eigenvectors of **L** with increasing eigenvalues are ordered according to smoothness. Imposing the modification with the selection matrix **S** focuses the smoothness on a certain subgraph, notwithstanding how the signal behaves outside it. Equation (4) can also be seen as a generalization of Laplacian embedding, since **L**^{1/2}**SL**^{1/2} reverts to **L** for the special case of **S** = **I**.

It is important to realize that the eigenvalues {*μ*_{k}} of the original design reflect the energy concentration in the subset 𝒮, while the eigenvalues {*ξ*_{k}} of the alternative design correspond to a modified embedded distance that can be interpreted as a “frequency value” localized in 𝒮, in analogy to the global GFT case. Consequently, “interesting” eigenvectors correspond to those with high *μ*_{k}, concentrated in the subset 𝒮, or low *ξ*_{k}, showing the main localized low-frequency trends, respectively. However, the eigendecompositions, taken individually, do not necessarily lead to eigenvectors that combine both virtues.

### Guiding Spectral Embedding Using a New Criterion

**S**to a cooperation matrix

**M**with diagonal elements that can take any nonnegative real values

*m*

_{l}≥ 0,

*l*= 1, …,

*N*. This allows to gradually change the impact of a node on the analysis, between an enhanced (

*m*

_{l}> 1), an unmodified (

*m*

_{l}= 1), and a reduced (

*m*

_{l}< 1) importance with respect to the selection matrix case. Second, we combine the criteria of both already existing Slepian designs by subtracting the modified embedded distance from the energy concentration:

**g**to be any graph signal, which is an operational choice because of the joint optimization of both criteria, as will be illustrated and discussed later.

**L**

^{1/2}in terms of the adjacency matrix

**A**:

*c*

_{k}= $(2k)!22k(k!)2(2k\u22121)$. Details on the series expansion are discussed in the Taylor Series section. We can then further rewrite the internal part of the Criterion 5 as

By convention, the associated eigenvalues are sorted in decreasing order. Based on the fact that eigenvalues of the symmetric normalized Laplacian are greater or equal to 0 and lower or equal to 2, one can derive *m*_{max} ≥ *ζ*_{1} ≥ *ζ*_{2} ≥ … ≥ −2*m*_{max}, where *m*_{max} is the highest cooperation value appearing in **M**, using bounds from Corollary 2.4 in Lu and Pearce (2000).

Interestingly, the combination of both existing Slepian criteria leads to the emergence of the adjacency matrix **A** as the key player in our new formalism. In fact, when the cooperation matrix is the identity matrix, the criterion reverts to the eigendecomposition of **A** itself.

Let us now interpret the impact of the cooperation weights: Obviously, an element *a*_{i,j} of the adjacency matrix contains the weight of a direct path from *i* to *j*. The linear approximation *ζ*_{lin} reweights such a direct path with the average (*m*_{i} + *m*_{j})/2 of the cooperation weights that are attributed to nodes *i* and *j*, as illustrated in Figure 1A (left half). Notice that paths where only one node has a cooperation weight equal to 0 are still possible, as the other cooperation weight is then simply divided by 2.

*i*and

*j*. For instance, the sum of all length-2 paths between

*i*and

*j*can be read out from the squared adjacency matrix:

*l*through which it passes.

Analogously, the term **A**^{k} in the criterion introduces modifications of *k*-length paths in the graph. However, for *k* > *N*, reweighting reduces to modifications of lower length paths. The Cayley-Hamilton theorem implies that for every matrix **A** of size *N* × *N*, the matrix **A**^{N} can be written as a linear combination of matrices **A**^{k} for *k* = 0, 1, … *N* − 1. By induction, it holds that **A**^{k} for every *k* > *N* can also be written as a linear combination of the same set of *N* matrices. Hence, modifications of paths longer than *N* − 1 can be seen as a linear combination of additional modifications of paths of length 0 to *N* − 1.

## MATHEMATICAL CONSIDERATIONS

This section provides mathematical foundations supporting the methods and the results presented in this work. We start by discussing the link between the selection matrix and the eigenspectrum associated with the energy concentration criterion, and the relationship with the modified embedded distance criterion, using full bandwidth. Then, we provide a formal justification of the Taylor series approximation of the square root matrix function used in Equation 6, and discuss the error associated with this approximation.

### Eigenspectrum Associated With the Energy Concentration Criterion

For full bandwidth, the concentration matrix is defined as **C** = **U**^{⊤}**SU**, where **U** is the matrix whose columns are eigenvectors of the graph Laplacian, and **S** is a diagonal selection matrix. Hence, the eigendecomposition of **C** is trivial: Its eigenvectors are the rows of **U**, and the eigenvalues of **C** correspond to the diagonal entries of **S**, as can be seen from Figure 3A for *W* = 279.

### Eigenspectrum Associated With the Modified Embedded Distance Criterion

*z*

_{λ}, is lower bounded by the number of zeros on the diagonal of the selection matrix, denoted

*z*

_{S}. To see this, consider the following decomposition of the modified embedded distance matrix

**C**

_{emb}:

*i*

_{k}is the index of the

*k*

^{th}nonzero entry of the selection matrix

**S**, and

*l*

_{ik}denotes the

*i*

_{k}

^{th}column vector of the matrix

**L**

^{1/2}. From this expression, it can be seen that the rank of

**C**

_{emb}is at most

*N*−

*z*

_{S}and hence,

*z*

_{λ}≥

*z*

_{S}. Equality holds when the set of vectors {

*l*

_{ik}} corresponding to the nonzero entries of

**S**are linearly independent. This is the case for connected graphs, as any subset (with cardinality strictly less than

*N*) of the columns of

**L**

^{1/2}is linearly independent. This relationship is observed in Figure 3B for

*W*= 279.

### Taylor Series of Matrix-Valued Functions

**L**

^{1/2}proposed in Equation 6 is derived using the scalar Taylor series of

*f*(

*x*) = $x$ evaluated around the point

*a*= 1:

*t*

_{k}= $(\u22121)k\u22121(2k)!22k(k!)2(2k\u22121)$ and

*x*∈ ℝ,

*x*> 0. The square root matrix of

**L**then writes

**Λ**

_{L}=

**I**−

**Λ**

_{A}and their eigenvectors are equal (

**U**

_{L}=

**U**

_{A}) when ordered following increasing and decreasing eigenvalues, respectively. The previous equation finally reduces to

*c*

_{k}= $(2k)!22k(k!)2(2k\u22121)$, which is the expression used in Equation 6.

*f*(

*x*) to a finite upper bound on

*k*≤

*K*leads to an approximation error that can be estimated by the Lagrange form of the remainder

*K*+ 1)

^{th}derivative is evaluated at the point

*y*found between

*x*and 1. On the other hand, since the eigenvectors forming

**U**

_{L}are unit-norm vectors, the distance

*d*

_{K}between a finite sum approximation of

**L**

^{1/2}and the true square root of the matrix is bounded as

_{F}denotes the Frobenius norm. In the case of a first-order Taylor approximation (

*K*= 1), we get

The eigenvalues *λ*_{i} range from 0 to 2, and all contribute to the total approximation error *d*_{1}, with eigenvalues further from 1 contributing more. Since the second-order derivative of the square root function increases as its argument approaches 0, the most contributing factors of the error derive from Taylor approximation terms with near-zero eigenvalues. Hence, graphs whose Laplacian spectrum exhibits higher eigengaps in the lower band tend to have lower approximation error.

*d*

_{K,M}between the true proposed criterion

**M**−

**L**

^{1/2}

**ML**

^{1/2}and its approximation using a

*K*

^{th}-order Taylor approximation of

*L*

^{1/2}verifies

**M**||

_{F}corresponds to the Frobenius norm of the cooperation matrix. Hence, the upper bound on

*d*

_{K,M}reduces as the nodes are given less importance; that is, when the cooperation values get closer to 0.

## RESULTS

The *C. elegans* worm is an intensely studied model organism in biology. In particular, the wiring diagram of its 302 neurons has been carefully mapped during a long and effortful study (White, Southgate, Thomson, & Brenner, 1986). Here, we use the graph that summarizes data from 279 somatic neurons (unconnected and pharyngeal neurons were excluded from the full diagram of 302 neurons), and combined connectivity from chemical synapses and gap junctions (Chen, Hall, & Chklovskii, 2006). The binary adjacency matrix **A**_{bin} with edge weights 0 or 1 has been symmetrically normalized with the degree matrix **D** into **A** = **D**^{−1/2}**A**_{bin}**D**^{−1/2}, as described in the Essential Graph Concepts section. We retrieved the type of each neuron (sensory neuron, interneuron, or motoneuron) from the WormAtlas database (http://www.wormatlas.org/).

In their modeling work, Varshney, Chen, Paniagua, Hall, & Chklovskii (2011) studied network properties of the worm connectome using different approaches, including Laplacian embedding. In particular, the topological view generated by mapping nodes on the first two eigenvectors with smallest nonzero eigenvalues already reveals interesting network organization (see Figure 2). The horizontal dimension (**u**_{2}) mainly distinguishes the motoneurons from the head (right green circles) and from the ventral cord (left green circles). The vertical dimension (**u**_{3}) reflects information flow from sensory neurons and interneurons of the animal’s head (top) to the nerve ring and ventral cord circuitries (bottom).

### Eigenvalues of Different Criteria

To illustrate the eigenvalues obtained with the existing Slepian designs, as well as the newly proposed criterion, we considered the 128 motoneurons and “unselected” them by setting their respective entries in **S** to 0. We applied the original, concentration-based Slepian design for different bandwidths *W* = 100, 150, 200, 279, the latter corresponding to full bandwidth. The eigenvalues *μ*_{k}, which reflect energy concentration in the 151 remaining neurons, are shown in Figure 3A. The characteristic behavior of classical Slepians is preserved for the graph variant; that is, eigenvalues cluster around 1 and 0 for well- and poorly concentrated eigenvectors, respectively, and the phase transition occurs more abruptly at higher bandwidth. For full bandwidth, perfect concentration becomes possible, and the problem degenerates in retrieving two linear subspaces of 151 and 128 dimensions spanned by eigenvectors with concentration 1 and 0, respectively (see the Mathematical Considerations section for a proof on the number of distinct eigenvalues). In practical terms, for high but not full bandwidth, the “interesting” eigenvectors with large concentration correspond to the part indicated by the green area on the plot, and become numerically indistinguishable. A few indicative examples of Slepian vectors across bandwidths are displayed in Supplementary Figure S1C (Supporting Information).

Next, we applied the modified Slepian design inspired by the Laplacian embedded distance. As shown in Figure 3B, the eigenvalues *ξ*_{k} reflect the modified embedded distance, which we now want to minimize. For increasing bandwidth (darker curves), its smallest values can be made lower; however, the subset of nodes with *S*_{i,i} entries set to 0 is also described by eigenvectors with small eigenvalues. This becomes even clearer at full bandwidth, a case for which a subspace of 128 dimensions spanned by eigenvectors with a modified embedded distance of 0 is retrieved, as indicated by the green area in Figure 3B and explicitly demonstrated in the Mathematical Considerations section. Some examples of Slepians across bandwidths can be seen in Supplementary Figure S1D (Supporting Information).

The degeneracies of the Slepian designs at full bandwidth are instructive about the opposing effects of maximizing energy concentration and minimizing modified embedded distance; that is, the subspaces indicated by the green areas in Figures 3A and 3B, which are optimal for the corresponding criteria, are actually different ones, representing signals on sensory and interneurons (151 nodes) on the one hand, and on motoneurons (128 nodes) on the other hand (compare Supplementary Figures S1C and S1D, first rows; Supporting Information). This leads us to the eigenvalues *ζ*_{k} of the proposed criterion, as shown in Figure 3C (black curve).

The maximum eigenvalue peaks close to 1, a case reflecting jointly high equivalent *μ*_{k} (blue curve) and low equivalent *ξ*_{k} (purple curve); that is, a high energy concentration at the same time as a low modified embedded distance (low localized graph frequency) within 𝒮. The low amount of such solutions shows that it is difficult to conceal high energy concentration and small modified embedded distance.

As values of *ζ*_{k} decrease, we first observe a rise in modified embedded distance (eigenvectors remain reasonably concentrated within 𝒮, but rapidly exhibit a larger localized graph frequency), and then a decrease of both *μ*_{k} and *ξ*_{k}, which indicates that eigenvectors become less concentrated within the subset of interest. Afterwards, we observe a regime in which both quantities are null at the same time; that is, a subspace spanned by eigenvectors that are fully concentrated outside 𝒮. Notice that this set of eigenvectors is now “pushed away” from the meaningful low *ξ*_{k} ones, and lie in the middle of the spectrum. Finally, the sign of *ζ*_{k} switches, and the right-hand side of Figure 3C denotes eigenvectors of increasing concentration within 𝒮 and localized graph frequency, the latter effect dominating over the former.

Interestingly, computing the eigenspectrum using a linear approximation of the criterion matrix (Figure 3D, light brown curve) leads to very similar results, which only slightly vary for the largest eigenvalues. When the approximation order is increased up to 20 (increasingly dark brown curves), this low error further diminishes, although a mild difference remains with the ground truth. Inspection of the Slepian vectors related to several locations of the eigenspectrum (Supplementary Figure S2, Supporting Information) confirmed that the only salient differences actually involved the first Slepian vector (largest eigenvalue one).

### Topology Revealed by Guided Spectral Analysis

We now guide the spectral analysis to focus on the three different types of neurons. For instance, when focusing on the role of the sensory neurons, we gradually decrease the cooperation weights of interneurons and motoneurons from 1 to 0. For each setting, we then visualize the topology revealed by the guided analysis by projecting the nodes on the eigenvectors with the second and third largest eigenvalues. We build the trajectory of each node through this two-dimensional embedding, after applying the Procrustes transform (Schönemann, 1966) to compensate for any irrelevant global transformations. As a complementary visualization, note that we provide the start, intermediate, and end points of each trajectory as separate figures in Supplementary Figure S3 (Supporting Information). Finally, k-means clustering was performed on the nodes in focus at the end point embedding of trajectories, producing sets of clusters given in Supplementary Figure S4 and Supplementary Tables 1–3 (see Supporting Information for details). Example visualizations when resorting to different Slepian vectors are provided in Supplementary Figure S5 (Supporting Information).

In Figures 4A and 4B, the trajectories are depicted when focusing on the sensory neurons by attributing cooperation weights to the other types of neurons ranging from 1 to 0.5, and from 0.5 to 0, respectively. During the first half (Figure 4A), the network organization is only slightly altered with respect to the initial view of Figure 2; that is, the sensory neurons move slightly more to the periphery, while the interneurons and motoneurons move to the origin. In the second part of the trajectory (Figure 4B), a major split occurs in the bottom right branch of Figure 4A between the left and right versions of a whole series of neurons, while the bottom left branch neurons move back to the center of the coordinate frame. The cell types found in the top branch are amphid neurons, whereas the rest of the sensory neurons split into their left and right counterparts located in the left and right bottom branches. The clusters found by the k-means approach (see Supplementary Table 1, Supporting Information) include a group of five bilateral amphid neurons (AWA, AWC, ASE, ASI, and AFD; cluster C_{3}) and six other clusters, two of which span the bottom left and right subbranches (clusters C_{5} and C_{2}).

As described in the Guiding Spectral Embedding section, since paths through nodes with cooperation weights set to 0 are still considered by the proposed criterion, the embedding focusing on a particular subtype of neurons can still include functionally distinct cells as clearly standing out in the visualization. For instance, in addition to the above clustering of sensory neurons in Figure 4B, we notice the segregation of the bilateral RIP interneurons towards the left and the right branch. This shows that the embedding does not neglect nodes outside the focus, even when their cooperation weight is set to 0.

In Figures 5A and 5B, we then focus on the interneurons by reducing the cooperation weights of sensory neurons and motoneurons in two steps. As expected, the interneurons move towards the periphery. Their organization does not seem to be dominated by left *versus* right variants, as we found for sensory neurons, but rather by a set of well-defined clusters related to their functional involvement in the *C. elegans* neuronal circuitry (see Supplementary Table 2, Supporting Information): In the first quadrant, we find the isolated AIA bilateral pair (cluster C_{4}). Moving clockwise, a larger cluster of neurons includes the bilateral AIY, AIZ, AIN, AIB, RIA, RIB, AUA, and the single neurons RIR and RIH (cluster C_{3}). Next we find a cluster including AVE, AVK, RIG, PVT, DVA, and other neurons located closer to the origin of Figure 5 (cluster C_{5}), before reaching another large ensemble of neurons including the bilateral AVA, AVD, LUA, PVC, PVW, and the single neuron PVR (cluster C_{6}). Moving back upwards, cluster C_{1} contains the bilateral AVB, AVJ, BDU, the single neuron AVG, and PVPR, whose left counterpart PVPL belongs to cluster C_{5}, thus standing as the only bilateral pair of neurons split into different clusters. Finally, we reach the last group of cells containing the bilateral RIF, AVH, AIM, PVQ, and AVF (cluster C_{2}).

Finally, in Figures 6A and 6B, the organization of motoneurons is examined. Already in the first step (Figure 6A), when reducing the cooperation weights of the sensory and interneurons from 1.0 to 0.5, we observe much stronger changes than in the previous cases. In particular, the initial organization completely collapses and the left branch of the motoneurons spreads out. This branch then develops into a peripheral organization when further decreasing the cooperation weights (Figure 6B), with three main subsets of neurons and ambiguous positioning of the cell DVB between the left and the right bottom branches. K-means clustering into optimal cell groups captured this architecture into seven smaller clusters (Supplementary Table 3, Supporting Information): Clusters C_{4} and C_{7} spanned top neurons, clusters C_{2} and C_{3} included the bottom left branch neurons, and clusters C_{5} and C_{6} contained the bottom right branch cells.

## DISCUSSION

### Beyond Original Slepian Designs

The originality of our approach lies in providing a new and simple way to guide graph spectral analysis. Inspired by graph Slepians, we propose a novel criterion that combines energy concentration and modified embedded distance, taking into account cooperation weights that can gradually increase or decrease the importance of selected nodes. The new criterion lets the adjacency matrix emerge as the central graph operator, instead of the Laplacian, and is operational at full bandwidth.

This is surprising at first sight, because neither of the conventional Slepian criteria is practical without the bandlimit constraint. For the energy concentration with binary cooperation weights, as shown in Figure 3A for an illustrative example on the *C. elegans* connectome, full bandwidth leads to two eigenvalues (1 and 0), the dimensionality of the corresponding subspaces being the number of nodes with cooperation weight 1 and 0, respectively. For the modified embedded distance, as shown in Figure 3B, full bandwidth creates a subspace with eigenvalue 0 of dimensionality equal to the number of nodes with cooperation weight 0. Therefore, subtracting both criteria leads to opposing objectives; that is, at full bandwidth, an energy concentration of 1 encodes the subspace for nodes with weight 1, while a modified embedded distance of 0 encodes the subspace for nodes with weight 0.

The obtained eigenspectrum for the new criterion, shown in Figure 3C, illustrates that only a few eigenvectors are able to combine high energy concentration with low modified embedded distance, a counterbalance that can be further revealed by measuring *μ* and *ξ* separately for these new eigenvectors. Such a large eigengap is also good news for numerical computation of the leading eigenvectors for large graphs when relying upon efficient large-scale solvers (Lehoucq & Sorensen, 1996) implemented in widely available software libraries such as ARPACK.

Intriguingly, the approximation error was already low using a linear approximation, and did not noticeably decrease further, except for the first Slepian vector, when resorting to higher order terms (see Figure 3 and Supplementary Figure S2, Supporting Information). Modifying the importance of a node *via* the corresponding cooperation value affects all-length paths through that node according to the series expansion from Equation 8, where the power of **A** in each term corresponds to the affected path length. Once we restrict the criterion to a linear approximation, the only paths whose importance is changed are those of length 1. This does not mean that other paths are not included in the graph analysis, but rather that they are included with their original (unmodified) effect on the topology. Low error of linear approximation suggests that the highest percentage of topological importance of a node falls into the importance of its length-1 paths. Further, a slightly higher error at eigenvectors with the highest *ζ* may be explained similarly: Not modifying higher order paths produces greater error at these eigenvectors because of their increased relative importance, because high *ζ* eigenvectors tend to be very smooth (even approaching a constant signal); thus, in order to even out the values at all nodes in the process, one needs to “reach” far enough.

The proposed criterion should not be confused with the Sobolev norm that is sometimes used to regularize graph signals (Mahadevan & Maggioni, 2006). Specifically, in the case of **M** = **I**, our criterion of Equation 5 applied to **g** reverts to **g**^{⊤}**g** − **g**^{⊤}**Lg**, whereas the Sobolev norm of **g** reads **g**^{⊤}**g** + **g**^{⊤}**Lg**. The difference in the sign of the second term introduces significantly distinct optimization goals regardless of the apparent similarity of the two expressions.

As for future extensions of our approach, one could envisage to dig into the relationship with graph uncertainty principles (Agaskar & Lu, 2013; Teke & Vaidyanathan, in press; Tsitsvero et al., 2016), to consider statistical resampling for graphs (Pirondini, Vybornova, Coscia, & Van De Ville, 2016), or to focus on the discovery of hierarchical graph structure (Arenas, Fernández, & Gómez, 2008; Irion & Saito, 2014) by gradual refinement of the subgraph. The design could also be extended to directed graphs using recent extensions of spectral decompositions in this context (Mhaskar, 2018; Sandryhaila & Moura, 2013).

### Gaining Insights on *C. elegans*

The application of our newly developed approach to the *C. elegans* connectome enabled us to confirm past findings from the literature, and to shed light on additional cellular targets and groupings that may deserve further experimental analyses. At the level of sensory neurons (Figure 4, Supplementary Figure S3A, and Supplementary Figure S4A; Supporting Information), seven clusters were extracted, collectively accounting for the three branches evident in Figure 4: the top branch made of 12 (including the thermosensor AFD) pairs of amphid neurons (at *y*-coordinate greater than 0.04), and other cells split into the left and right bottom branches. Interestingly, one of the clusters found by k-means included five pairs of bilateral amphid neurons: AWA and AWC involved in odortaxis (Bargmann, Hartwieg, & Horvitz, 1993; Li et al., 2012), the thermosensor AFD (Mori & Ohshima, 1995), and ASE and ASI implicated in chemotaxis (Bargmann & Horvitz, 1991; Luo et al., 2014). These neurons act as low-order sensors, whose extraction as a separate cluster inside the amphid group may suggest new information worth further exploration.

The lower branches in Figure 4 split the neurons into their right and left counterparts, thus extracting relevant somatic information. These neurons act as higher order sensing apparatus as compared with amphid neurons: IL1 and OLQ have jointly been implicated in the worm foraging response (Hart, Sims, & Kaplan, 1995); CEP and ADE are involved in the response upon food sensing (Sawin, Ranganathan, & Horvitz, 2000); URX, URY, and OLL are linked to the reproductive drive (Barrios, Ghosh, Fang, Emmons, & Barr, 2012), and so on. The split between low- and high-order sensing is summarized in Figure 7A.

Further inspection of the branches (Supplementary Figure S6A, Supporting Information) showed that the left-right segregation involved chemical synapses, but not gap junctions. Also, Supplementary Figure S5 (second row, Supporting Information) shows that for higher order Slepian vectors (fourth and fifth), additional contributors emerge, such as the bilateral PHA/PHB. This suggests that the approach finds different subgroups of higher order sensory neurons depending on the choice of the embedding eigenvectors. The biological/functional intepretation of the exact clusters asks for a more detailed analysis of the subgroups of neurons. Finally, the emergence of RIP interneurons in the embedding (Figure 4) points towards an important role of the sensory neurons yet to be explained, possibly in connection with their presynaptic inputs from IL1 (White et al., 1986).

Turning to interneurons (Figure 5, Supplementary Figure S3B, and Supplementary Figure S4B; Supporting Information), we notice a trend of grouping neurons at the same command-chain level. Starting from the top of Figure 5, we find AIA, AIB, AIY, and AIZ jointly known for their role on locomotory behavior and acting as a first-relay drives (Gray, Hill, & Bargmann, 2005; Wakabayashi, Kitagawa, & Shingai, 2004). Moving clockwise, we find RIA and RIB acting as second-layer intermediates, and further on, neurons such as AVE, and in the next cluster AVAL and AVD, all being command interneurons (Haspel, O’Donovan, & Hart, 2010; Hobert, 2003; Kawano et al., 2011). The trend of following the locomotory pathway clockwise in the embedding space suggests that the approach targets relevant information about the neural system. However, the exact compact clusters in Supplementary Figure S4B (Supporting Information) need further elaboration. Some of the interesting findings worth exploring would be the unexplained grouping of the scarcely studied RIR neuron (Hobert, Johnston, & Chang, 2002) with the cluster of cells including AIB and AIY, or the grouping of PVR and LUA (Chalfie et al., 1985; Wicks & Rankin, 1995) with locomotion-regulating neurons such as AVD and AVA.

Considering motoneurons (Figure 6, Supplementary Figure S3C, and Supplementary Figure S4C), the embedding positions fit somatic location (see Supplementary Figure S7): a spiral beginning at the origin, turning right, then moving clockwise and ending in the top branch follows the postero-anterior direction (Supporting Information). This confirms that the approach has extracted meaningful information. However, the exact split between the three branches as well as the k-means clustering into the seven ensembles remains unclear, since, from preliminary explorations, we find both A-type and B-type cholinergic motoneurons and the inhibitory D-type motoneurons in all clusters. Finally, DVB deserves further attention (Schuske, Beg, & Jorgensen, 2004) because of its isolated location between the two bottom branches.

In Figure 6B, two sensory nodes stick out the furthest away from the center; that is, towards the lower left and right branches of motoneurons. These are PVD and PHC neurons, responsible for nociceptive mechano- and thermosensation, respectively. The locations of these nodes in the embedding may be linked to the fact that harmful nociceptive stimuli induce a locomotory response. As in the case of RIP interneurons emerging in the focused embedding of sensory neurons, we once again confirm the ability of the proposed approach to extract important nodes even when their cooperation weight was initially set to 0.

In summary, as illustrated in Figure 7, all three types of neurons found in the *C. elegans* nematode could be arranged in a meaningful hierarchy thanks to the introduced guided graph spectral embedding. Sensory neurons were separated between first-order and higher order sensors. Different levels of processing of motor functions were distinguished (see the gradient from white to dark gray tones going clockwise in Figure 7B), with the eventual recruitment of motoneurons, which have been separated on the basis of somatic location. Future analyses will allow the study of different types of neurons through more elaborate combinations of focused nodes. In addition, it will be interesting to see whether future experimental work can shed light on some of the neurons that were extracted here without being yet extensively documented in the literature, such as AVKL or RIR.

### Perspectives for Future Uses

The proposed graph embedding provides a simple, yet powerful approach to visualization and, if combined with clustering techniques, to the extraction of meaningful subgraphs from any graph-modeled dataset. In neuroimaging, focusing on a specific subgraph of interest (by setting the appropriate cooperation values) can direct research onto clinically relevant concepts, such as the medial temporal lobe and limbic structures for human brain imaging studies comparing healthy controls and Alzheimer patients (Krasuski et al., 1998). Be it using the structural or the functional connectome for analyses (Contreras, Goñi, Risacher, Sporns, & Saykin, 2015), features such as cluster size and/or the inclusion of specific nodes (brain regions) in a cluster may become biomarkers for an early diagnosis or prediction of the disease.

Furthermore, graph modeling of the human brain is frequently employed to extract important nodes/brain regions and to identify their topological roles, such as provincial/connector hubs suggesting clinically significant functional roles (van den Heuvel & Sporns, 2013). Doing so requires the use of diverse node centrality measures, such as degree or betweenness centrality. On the other hand, entries of the proposed Slepian eigenvectors may be interpreted as higher order spectral centrality measures relative to the focused subgraph, and for the special case **M** = **I**, the eigenvector corresponding to the highest positive eigenvalue reverts to the eigenvector centrality (M. Newman, 2010). Hence, if clustering of a dataset based on the proposed embedding coordinates reveals nodes distant from the rest of the graph, it is suggested that those nodes exhibit a hub-like role when the focused subgraph is considered more important than the rest of the graph. For example, the AIA pair in the discussed *C. elegans* example emerges as a separate cluster in Figure 5 and Supplementary Figure S4B (Supporting Information), where the focus is set on interneurons. Its role as a hub can be confirmed by the high number of connections to the set of amphid neurons, and a small number of connections to the other cells, as compared with the rest of the interneurons. Identification of hubs and/or peripheral nodes with respect to other similar type nodes may lead to a better understanding of the functional role of both neurons and brain regions, depending on the inspected dataset.

## AUTHOR CONTRIBUTIONS

Miljan Petrovic: Investigation; Methodology; Validation; Writing – Review & Editing. Thomas Bolton: Formal analysis; Visualization; Writing – Review & Editing. Maria Giulia Preti: Writing – Review & Editing. Raphaël Liégeois: Methodology; Validation; Writing – Review & Editing. Dimitri Van De Ville: Conceptualization; Formal analysis; Writing – Original Draft; Writing – Review & Editing.

## FUNDING INFORMATION

Dimitri Van De Ville, CHIST-ERA IVAN project (http://dx.doi.org/10.13039/501100001942), Award ID: 20CH21_174081. Dimitri Van De Ville, Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (http://dx.doi.org/10.13039/501100001711), Award ID: 200021_175506. Dimitri Van De Ville, Bertarelli Foundation. Maria Giulia Preti, Center for Biomedical Imaging (CIBM) of the GenevaLausanne Universities and the EPFL.

## TECHNICAL TERMS

- Graph:
Mathematical model of networks including vertices (nodes) and edges connecting them.

- Eigenvector:
A vector whose multiplication with the given matrix produces a scaled version of itself.

- Laplacian:
Matrix describing a graph and containing information on smoothness of signals defined on that graph.

- Eigenvalue:
The value of the scaling of an eigenvector due to its multiplication with a matrix.

- Embedding:
Procedure of reducing dimensionality of the data; describing a dataset (graph) with less memory requirements while preserving important information.

- Slepians:
Signals with maximally localized structure in both time/vertex and frequency domain.

- Frobenius distance:
Square root of the sum of pointwise squared differences between two matrices.

- Procrustes transform:
Geometric transform that changes only the size and the overall position of an object but not its shape.

## REFERENCES

*C. elegans*

*C. elegans*

*C. elegans*

*Caenorhabditis elegans*

*Caenorhabditis elegans*

*C. elegans*GLR-1 glutamate receptor

*Caenorhabditis elegans*

*C. elegans*: Paradigms, circuits, genes

*Caenorhabditis elegans*model

*C. elegans*for forward locomotion

*C. elegans*feeding regulation

*C. elegans*chemotaxis circuit

*Caenorhabditis elegans*

*C. elegans*locomotory rate is modulated by the environment through a dopaminergic pathway and by experience through a serotonergic pathway

*C. elegans*

*C. elegans*connectome

*Caenorhabditis elegans*neuronal network

*Caenorhabditis elegans*neuronal network

*Caenorhabditis elegans*

*Caenorhabditis elegans*

*Caenorhabditis elegans*

## Author notes

Competing Interests: The authors have declared that no competing interests exist.

Handling Editor: Olaf Sporns