
Introduction to Information and Data Systems Research
This course will introduce students to research areas in IDS through weekly overview talks by Caltech faculty and aimed at first-year undergraduates. Others may wish to take the course to gain an understanding of the scope of research in computer science. Graded pass/fail. Not offered 2023-24.
Methods of Applied Mathematics
First term: Brief review of the elements of complex analysis and complex-variable methods. Asymptotic expansions, asymptotic evaluation of integrals (Laplace method, stationary phase, steepest descents), perturbation methods, WKB theory, boundary-layer theory, matched asymptotic expansions with first-order and high-order matching. Method of multiple scales for oscillatory systems. Second term: Applied spectral theory, special functions, generalized eigenfunction expansions, convergence theory. Gibbs and Runge phenomena and their resolution. Chebyshev expansion and Fourier Continuation methods. Review of numerical stability theory for time evolution. Fast spectrally-accurate PDE solvers for linear and nonlinear Partial Differential Equations in general domains. Integral-equations methods for linear partial differential equation in general domains (Laplace, Helmholtz, Schroedinger, Maxwell, Stokes). Homework problems in both 101 a and 101 b include theoretical questions as well as programming implementations of the mathematical and numerical methods studied in class.
Applied Linear Algebra
This is an intermediate linear algebra course aimed at a diverse group of students, including junior and senior majors in applied mathematics, sciences and engineering. The focus is on applications. Matrix factorizations play a central role. Topics covered include linear systems, vector spaces and bases, inner products, norms, minimization, the Cholesky factorization, least squares approximation, data fitting, interpolation, orthogonality, the QR factorization, ill-conditioned systems, discrete Fourier series and the fast Fourier transform, eigenvalues and eigenvectors, the spectral theorem, optimization principles for eigenvalues, singular value decomposition, condition number, principal component analysis, the Schur decomposition, methods for computing eigenvalues, non-negative matrices, graphs, networks, random walks, the Perron-Frobenius theorem, PageRank algorithm.
Linear Analysis with Applications
Introduction to Probability Models
This course introduces students to the fundamental concepts, methods, and models of applied probability and stochastic processes. The course is application oriented and focuses on the development of probabilistic thinking and intuitive feel of the subject rather than on a more traditional formal approach based on measure theory. The main goal is to equip science and engineering students with necessary probabilistic tools they can use in future studies and research. Topics covered include sample spaces, events, probabilities of events, discrete and continuous random variables, expectation, variance, correlation, joint and marginal distributions, independence, moment generating functions, law of large numbers, central limit theorem, random vectors and matrices, random graphs, Gaussian vectors, branching, Poisson, and counting processes, general discrete- and continuous-timed processes, auto- and cross-correlation functions, stationary processes, power spectral densities.
Relational Databases
Introduction to the basic theory and usage of relational database systems. It covers the relational data model, relational algebra, and the Structured Query Language (SQL). The course introduces the basics of database schema design and covers the entity-relationship model, functional dependency analysis, and normal forms. Additional topics include other query languages based on the relational calculi, data-warehousing and dimensional analysis, writing and using stored procedures, working with hierarchies and graphs within relational databases, and an overview of transaction processing and query evaluation. Extensive hands-on work with SQL databases.
Applied Data Analysis
Fundamentally, this course is about making arguments with numbers and data. Data analysis for its own sake is often quite boring, but becomes crucial when it supports claims about the world. A convincing data analysis starts with the collection and cleaning of data, a thoughtful and reproducible statistical analysis of it, and the graphical presentation of the results. This course will provide students with the necessary practical skills, chiefly revolving around statistical computing, to conduct their own data analysis. This course is not an introduction to statistics or computer science. I assume that students are familiar with at least basic probability and statistical concepts up to and including regression.
Error-Correcting Codes
Information Theory and Applications
Analysis and Design of Algorithms
This course develops core principles for the analysis and design of algorithms. Basic material includes mathematical techniques for analyzing performance in terms of resources, such as time, space, and randomness. The course introduces the major paradigms for algorithm design, including greedy methods, divide-and-conquer, dynamic programming, linear and semidefinite programming, randomized algorithms, and online learning.
Probability
Overview of measure theory. Random walks and the Strong law of large numbers via the theory of martingales and Markov chains. Characteristic functions and the central limit theorem. Poisson process and Brownian motion. Topics in statistics. Part b not offered 2023-24.
Distributed Computing
Programming distributed systems. Mechanics for cooperation among concurrent agents. Programming sensor networks and cloud computing applications. Applications of machine learning and statistics by using parallel computers to aggregate and analyze data streams from sensors. Not offered 2023-24.
Networks: Algorithms & Architecture
Networks: Structure & Economics
Probability and Algorithms
Part a: The probabilistic method and randomized algorithms. Deviation bounds, k-wise independence, graph problems, identity testing, derandomization and parallelization, metric space embeddings, local lemma. Part b: Further topics such as weighted sampling, epsilon-biased sample spaces, advanced deviation inequalities, rapidly mixing Markov chains, analysis of boolean functions, expander graphs, and other gems in the design and analysis of probabilistic algorithms. Parts a & b are given in alternate years. Not offered 2023-24.
Current Topics in Theoretical Computer Science
May be repeated for credit, with permission of the instructor. Students in this course will study an area of current interest in theoretical computer science. The lectures will cover relevant background material at an advanced level and present results from selected recent papers within that year's chosen theme. Students will be expected to read and present a research paper.
Inverse Problems and Data Assimilation
Models in applied mathematics often have input parameters that are uncertain; observed data can be used to learn about these parameters and thereby to improve predictive capability. The purpose of the course is to describe the mathematical and algorithmic principles of this area. The topic lies at the intersection of fields including inverse problems, differential equations, machine learning and uncertainty quantification. Applications will be drawn from the physical, biological and data sciences.
Machine Learning & Data Mining
Statistical Inference
Statistical Inference is a branch of mathematical engineering that studies ways of extracting reliable information from limited data for learning, prediction, and decision making in the presence of uncertainty. This is an introductory course on statistical inference. The main goals are: develop statistical thinking and intuitive feel for the subject; introduce the most fundamental ideas, concepts, and methods of statistical inference; and explain how and why they work, and when they don't. Topics covered include summarizing data, fundamentals of survey sampling, statistical functionals, jackknife, bootstrap, methods of moments and maximum likelihood, hypothesis testing, p-values, the Wald, Student's t-, permutation, and likelihood ratio tests, multiple testing, scatterplots, simple linear regression, ordinary least squares, interval estimation, prediction, graphical residual analysis.
Fundamentals of Statistical Learning
Advanced Topics in Machine Learning
This course focuses on current topics in machine learning research. This is a paper reading course, and students are expected to understand material directly from research articles. Students are also expected to present in class, and to do a final project.
Fundamentals of Information Transmission and Storage
Basics of information theory: entropy, mutual information, source and channel coding theorems. Basics of coding theory: error-correcting codes for information transmission and storage, block codes, algebraic codes, sparse graph codes. Basics of digital communications: sampling, quantization, digital modulation, matched filters, equalization.
Data, Algorithms and Society
This course examines algorithms and data practices in fields such as machine learning, privacy, and communication networks through a social lens. We will draw upon theory and practices from art, media, computer science and technology studies to critically analyze algorithms and their implementations within society. The course includes projects, lectures, readings, and discussions. Students will learn mathematical formalisms, critical thinking and creative problem solving to connect algorithms to their practical implementations within social, cultural, economic, legal and political contexts. Enrollment by application. Taught concurrently with VC 72 and can only be taken once as CS/IDS 162 or VC 72.
Foundations of Machine Learning and Statistical Inference
The course assumes students are comfortable with analysis, probability, statistics, and basic programming. This course will cover core concepts in machine learning and statistical inference. The ML concepts covered are spectral methods (matrices and tensors), non-convex optimization, probabilistic models, neural networks, representation theory, and generalization. In statistical inference, the topics covered are detection and estimation, sufficient statistics, Cramer-Rao bounds, Rao-Blackwell theory, variational inference, and multiple testing. In addition to covering the core concepts, the course encourages students to ask critical questions such as: How relevant is theory in the age of deep learning? What are the outstanding open problems? Assignments will include exploring failure modes of popular algorithms, in addition to traditional problem-solving type questions.
Computational Cameras
Computational cameras overcome the limitations of traditional cameras, by moving part of the image formation process from hardware to software. In this course, we will study this emerging multi-disciplinary field at the intersection of signal processing, applied optics, computer graphics, and vision. At the start of the course, we will study modern image processing and image editing pipelines, including those encountered on DSLR cameras and mobile phones. Then we will study the physical and computational aspects of tasks such as coded photography, light-field imaging, astronomical imaging, medical imaging, and time-of-flight cameras. The course has a strong hands-on component, in the form of homework assignments and a final project. In the homework assignments, students will have the opportunity to implement many of the techniques covered in the class. Example homework assignments include building an end-to-end HDR (High Dynamic Range) imaging pipeline, implementing Poisson image editing, refocusing a light-field image, and making your own lensless "scotch-tape" camera.
Introduction to Data Compression and Storage
The course will introduce the students to the basic principles and techniques of codes for data compression and storage. The students will master the basic algorithms used for lossless and lossy compression of digital and analog data and the major ideas behind coding for flash memories. Topics include the Huffman code, the arithmetic code, Lempel-Ziv dictionary techniques, scalar and vector quantizers, transform coding; codes for constrained storage systems. Given in alternate years; not offered 2023-24.
Mathematics of Signal Processing
This course covers classical and modern approaches to problems in signal processing. Problems may include denoising, deconvolution, spectral estimation, direction-of-arrival estimation, array processing, independent component analysis, system identification, filter design, and transform coding. Methods rely heavily on linear algebra, convex optimization, and stochastic modeling. In particular, the class will cover techniques based on least-squares and on sparse modeling. Throughout the course, a computational viewpoint will be emphasized. Not offered 2023-24.
Numerical Algorithms and their Implementation
This course gives students the understanding necessary to choose and implement basic numerical algorithms as needed in everyday programming practice. Concepts include: sources of numerical error, stability, convergence, ill-conditioning, and efficiency. Algorithms covered include solution of linear systems (direct and iterative methods), orthogonalization, SVD, interpolation and approximation, numerical integration, solution of ODEs and PDEs, transform methods (Fourier, Wavelet), and low rank approximation such as multipole expansions. Not offered 2023-24.
Multiscale Modeling
Part a: Multiscale methodology for partial differential equations (PDEs) and for stochastic differential equations (SDEs). Basic theory of underlying PDEs; basic theory of Gaussian processes; basic theory of SDEs; multiscale expansions. Part b: Transition from quantum to continuum modeling of materials. Schrodinger equation and semi-classical limit; molecular dynamics and kinetic theory; kinetic theory, Boltzmann equation and continuum mechanics. Not offered 2023-24.
Computational Tools for Decoding Microbial Ecosystems
Undergraduate Reading in the Information and Data Sciences
Supervised reading in the information and data sciences by undergraduates. The topic must be approved by the reading supervisor and a formal final report must be presented on completion of the term. Graded pass/fail.
Undergraduate Projects in Information and Data Sciences
Supervised research in the information and data sciences. The topic must be approved by the project supervisor and a formal report must be presented upon completion of the research. Graded pass/fail.
Undergraduate thesis in the Information and Data Sciences
Individual research project, carried out under the supervision of a faculty member and approved by the option representative. Projects must include significant design effort and a written Report is required. Open only to upperclass students. Not offered on a pass/fail basis.
Topics in Linear Algebra and Convexity
The content of this course varies from year to year among advanced subjects in linear algebra, convex analysis, and related fields. Specific topics for the class include matrix analysis, operator theory, convex geometry, or convex algebraic geometry. Lectures and homework will require the ability to understand and produce mathematical proofs. Not offered 2023-24.
Topics in Optimization
Material varies year-to-year. Example topics include discrete optimization, convex and computational algebraic geometry, numerical methods for large-scale optimization, and convex geometry. Not offered 2023-24.
Markov Chains, Discrete Stochastic Processes and Applications
Stable laws, Markov chains, classification of states, ergodicity, von Neumann ergodic theorem, mixing rate, stationary/equilibrium distributions and convergence of Markov chains, Markov chain Monte Carlo and its applications to scientific computing, Metropolis Hastings algorithm, coupling from the past, martingale theory and discrete time martingales, rare events, law of large deviations, Chernoff bounds.