The fundamental algorithms in data mining and analysis form the basis for the emerging field of data science, which includes automated methods to analyze patterns and models for all kinds of data, with applications ranging from scientific discovery to business intelligence and analytics. This textbook for senior undergraduate and graduate data mining courses provides a broad yet in-depth overview of data mining, integrating related concepts from machine learning and statistics. The main parts of the book include exploratory data analysis, pattern mining, clustering, and classification. The book lays the basic foundations of these tasks, and also covers cutting-edge topics such as kernel methods, high-dimensional data analysis, and complex graphs and networks. With its comprehensive coverage, algorithmic perspective, and wealth of examples, this book offers solid guidance in data mining for students, researchers, and practitioners alike. Key features: • Covers both core methods and cutting-edge research • Algorithmic approach with open-source implementations • Minimal prerequisites: all key mathematical concepts are presented, as is the intuition behind the formulas • Short, self-contained chapters with class-tested examples and exercises allow for flexibility in designing a course and for easy reference • Supplementary website with lecture slides, videos, project ideas, and more

deviations from the mean. The resulting Iris graph has 150 nodes and 753 edges. The nodes in the Iris graph in Figure 4.2 have also been categorized according to their class. The circles correspond to class iris-versicolor, the triangles to iris-virginica, and the squares to iris-setosa. The graph has two big components, one of which is exclusively composed of nodes labeled as iris-setosa. 4.2 Topological Attributes In this section we study some of the purely topological, i.e., edge-based or

authority score converges to the dominant eigenvector of AT A, whereas the hub score converges to the dominant eigenvector of AAT . The power iteration method can be used to compute the eigenvector in both cases. Starting with an initial authority vector a = 1n , the vector of all ones, we can compute the vector h = Aa. To prevent numeric overﬂows, we scale the vector by dividing by the maximum element. Next, we can compute a = AT h, and scale it too, which completes one iteration. This process

make a similar assumption that the graphs are large and sparse. CHAPTER 4. GRAPH DATA 127 Small-world Property It has been observed that many real graphs exhibit the so-called small-world property that there is a short path between any pair of nodes. We say that a graph G exhibits small-world behavior if the average path length µL scales logarithmically with the number of nodes in the graph, that is, if µL ∝ log n where n is the number of nodes in the graph. A graph is said to have ultra

p = n(n−1)/2 other words, we are interested in the asymptotic behavior of the graph as n → ∞ and p → 0. Under these two trends, notice that the expected value and variance of X can be rewritten as E[X] = (n − 1)p ≃ np as n → ∞ var(X) = (n − 1)p(1 − p) ≃ np as n → ∞ and p → 0 In other words for large and sparse random graphs the expectation and variance of X are the same E[X] = var(X) = np and the Binomial distribution can be approximated by a Poisson distribution with parameter λ, given as λk

1. DATA MINING AND ANALYSIS 12 Using (1.1), the Euclidean distance between them is given as √ √ δ(a, b) = (5 − 1)2 + (3 − 4)2 = 16 + 1 = 17 = 4.12 The distance can also be computed as the magnitude of the vector 5 1 − 3 4 a−b= = √ since a − b = 42 + (−1)2 = 17 = 4.12. The unit vector in the direction of a is given as ua = 1 a =√ 2 a 5 + 32 5 3 1 =√ 34 4 −1 5 3 = 0.86 0.51 The unit vector in the direction of b can be computed similarly ub = 0.24 0.97 These unit vectors are also