The ability to learn is one of the most fundamental attributes of intelligent behavior. Consequently, progress in the theory and computer modeling of learn ing processes is of great significance to fields concerned with understanding in telligence. Such fields include cognitive science, artificial intelligence, infor mation science, pattern recognition, psychology, education, epistemology, philosophy, and related disciplines. The recent observance of the silver anniversary of artificial intelligence has been heralded by a surge of interest in machine learning-both in building models of human learning and in understanding how machines might be endowed with the ability to learn. This renewed interest has spawned many new research projects and resulted in an increase in related scientific activities. In the summer of 1980, the First Machine Learning Workshop was held at Carnegie-Mellon University in Pittsburgh. In the same year, three consecutive issues of the Inter national Journal of Policy Analysis and Information Systems were specially devoted to machine learning (No. 2, 3 and 4, 1980). In the spring of 1981, a special issue of the SIGART Newsletter No. 76 reviewed current research projects in the field. . This book contains tutorial overviews and research papers representative of contemporary trends in the area of machine learning as viewed from an artificial intelligence perspective. As the first available text on this subject, it is intended to fulfill several needs.

boundary. bounded by S4 and G4, is shown in Figure 2.7. This learned version space is independent of the sequence in which the training examples are presented (because in the end it contains all hypotheses consistent with the set of examples). As further training data is encountered, the S and G boundaries will move monotonically closer to each other, delimiting a smaller and smaller version space of candidate hypotheses. s4: {)

decision tree and thus exhibits precisely the bias "shorter trees are preferred over longer trees." ID3 can be viewed as an efficient approximation to BFS-ID3, using a greedy heuristic search to attempt to find the shortest tree without conducting the entire breadth-first search through the hypothesis space. Because ID3 uses the information gain heuristic and a hill climbing strategy, it exhibits a more complex bias than BFS-ID3. In particular, it does not always find the shortest consistent

n. In contrast, a boolean attribute B that splits the same n examples exactly in half will have Splitlnfomzation of 1. If attributes A and B produce the same information gain, then clearly B will score higher according to the Gain Ratio measure. One practical issue that arises in using GainRatio in place of Gain to select attributes is that the denominator can be zero or very small when ISi 1 x IS1 for one of the Si. This either makes the GainRatio undefined or very large for attributes that

classifies them correctly. (b) A set of training examples that is not linearly separable (i.e., that cannot be correctly classified by any straight line). xl and x2 are the Perceptron inputs. Positive examples are indicated by "+", negative by "-". the inputs are fed to multiple units, and the outputs of these units are then input to a second, final stage. One way is to represent the boolean function in disjunctive normal form (i.e., as the disjunction (OR) of a set of conjunctions (ANDs) of

tested on separate data samples, differences in the two sample errors might be partially attributable to differences in the makeup of the two samples. Confidence level N 90% 95% 98% 99% TABLE 5.6 Values oft^," for two-sided confidence intervals. As v + w, t ~ , approaches " ZN. 5.6.1 Paired t Tests Above we described one procedure for comparing two learning methods given a fixed set of data. This section discusses the statistical justification for this procedure, and for the confidence