Contact Information

Instructor: Jonathan “Nate” Wells

Email:

Classroom: Library 389

Office: Library 392

In-person Office Hours: M 3-4pm, W 10-11am, F 10-11am

Virtual Office Hours: T 2-3pm

Zoom Link: https://zoom.us/my/wellsj392


Course Information

Course Description

This course is an overview of modern approaches to analyzing large and complex data sets that arise in a variety of fields from biology to marketing to astrophysics. The most important modeling and predictive techniques will be covered, including regression, classification, clustering, resampling, and tree-based methods. This course will make extensive use of the R programming language.

Prerequisites: MATH141, or Instructor Consent.

Distribution Requirements

his course can be used towards your Group III, “Natural, Mathematical, and Psychological Science,” requirement. It accomplishes the following learning goals for the group:

  1. Use and evaluate quantitative data or modeling, or use logical/mathematical reasoning to evaluate, test or prove statements.
  2. Given a problem or question, formulate a hypothesis or conjecture, and design an experiment, collect data or use mathematical reasoning to test or validate it.
  3. Collect, interpret and analyze data.

This course does not satisfy the “primary data collection and analysis” requirement.

Textbook

Students do not need to purchase paper copies of texts.

  • (Primary) An Introduction to Statistical Learning, 2nd Edition (2021) by James, Witten, Hastie, and Tibshirani. A free pdf can be obtained through SpringerLink (log-in using your Reed credentials)
  • (Secondary) Applied Predictive Modeling, 1st Edition by Kuhn and Johnson. We will use select content from this text to supplement . A free pdf can be obtained through SpringerLink (log-in using your Reed credentials)

Course Resources

The following web-based resources will be used for communicating class information:

Technology

You are encouraged to bring a personal computer to class each day for notetaking and live coding. Access to a computer with webbrowser will be required for homework completion and submission. Computing & Information Services offers programs for long-term laptop loan: https://www.reed.edu/cis/facilities/student-technology-equipment-program.html

We will make very frequent use of the R programming language to create statistical models, run simulations, and implement stat learning algorithms. All homework will be completed using the RStudio IDE. R and RStudio are free to use, and can either be installed locally on your computer, or can be accessed using the Reed RStudio Server: https://rstudio.reed.edu/

Throughout the term, we will use GitHub to manage and submit assignments. GitHub is a hosting service to house Git-based projects online, and is designed to assist with version control and collaboration on big projects. https://github.com/

Communication

If you would like to contact me, I can most easily be reached via Slack message weekdays between 8am and 6pm. While I try to answer messages as soon as possible, in some cases, I may not be able to respond until the following school day. If you’d prefer to talk live, send me a message and we can schedule a time to chat on zoom.


Course Outcomes

By the end of the course, a student should be able to:

  • Articulate and compare the different philosophical approaches to prediction, statistical inference, classification, and clustering.

  • Create valid statistical models, perform data analysis using software, and communicate results in non-technical language using reproducible methods in order to answer a particular research question.

  • Implement simulation and randomization algorithms in order to demonstrate and assess properties of statistical models.

  • Assess and compare the performance of a variety of statistical models, and select appropriate models according to suitable criteria.

  • Apply statistical learning techniques to real-world data and problems.

  • Justify and describe properties of particular statistical learning methods by appealing to mathematical theory.


Course Format

A typical class day will involve the following:

  • Reading Assignment. Every class will have an assigned reading which you are strongly encouraged to review prior to the start of class.
  • Active Lectures. Our 50-minute class meetings will include an interactive lecture by the instructor, with some time devoted to discussion either class-wide or in small groups.
  • Group Work. At least once each week, a majority of class time will be reserved for collaborative coding and group work with your peers. Students should bring a personal laptop computer to class on these days.

Workload

A prepared student will attend class for 50 minutes per day, three days each week, and spend about two to three hours per day of class on work outside the classroom (reading, doing homework, working on projects, discussing, studying, etc.). Together, this represents a 9 - 12 hour per week commitment.


Grading Criteria

Your grade in the class will be determined by your proficiency in each of the Course Outcomes, as demonstrated in the following assessments:

  1. Daily Reading
  2. Homework
  3. Participation
  4. Midterm Exams
  5. Final Project

Daily Reading

Statistical knowledge takes time to develop, and understanding deepens upon revisiting a concept a \(2^\text{nd}\), \(3^\text{rd}\), or \(n^\text{th}\) time. Studying basic terminology and elementary examples in the textbook before class means that class can be spent clarifying and expanding ideas, rather than introducing them. Daily reading assignments will be posted on our course webpage, and will list the specific section(s) to read for each day, along with a response question to be completed by 8am each day of class (to give me time to review them before class). Responses should be posted in the #daily-reading channel of our Slack workspace. The questions are not intended to be overly difficult, but should help both you and I highlight topics that need further review. Responses will be assessed primarily on the basis of completion. No extensions on daily readings will be given, but up to three assignments may be missed without penalty.

Homework

A weekly problem set will be made available after class on Wednesday, due by 5pm on the following Wednesday. Some time on Monday or Friday of each week will be devoted to collaborative coding components of the assignment. Problem sets must be completed as a .rmd file in RStudio and submitted via GitHub. Detailed submission instructions can be found on the course webpage. Up to two times throughout the term, you may request up to a 5 day extension on your assignment. Except in extraordinary circumstances, requests must be made prior to an assignment’s due date.

In-class Participation

It is important to attend each class when able and to actively participate via notetaking and discussion. If you are unable to attend class for any reason, please notify me promptly so that we can make appropriate arrangements for you to make-up missed work. Frequent absences for which make-up work is not completed will be reflected in your final course grade.

Please do not come to class if you are ill, or if you’ve had close exposure to someone else who has been ill.

Midterm Exams

Two take-home exams will be given during the term, and will be made available on a Friday, to be completed before class the following Monday. Tentatively, the first is scheduled for Friday, October 8th (Week 6) and the second for Friday, November 19 (Week 11). The exams are intended to take between 3 and 4 hours to complete.

Final Project

Throughout the second half of the term, you will work in groups of 3 or 4 on a project that answers a significant research question using real-world data, by implementing the fundamental techniques developed in our class, as well as some more advanced methods from supplementary sources. The project will culminate in a 20 minute presentation during finals week and a 5-10 page reproducible technical report.


Community Information

Accessibility

Reed College is committed to creating inclusive and accommodating learning environments. Please notify me as soon as possible if there are aspects of the instruction or design of this course that result in barriers to your participation. I also encourage you to contact Disability & Accessibility Resources (DAR) at https://www.reed.edu/disability-resources/ for additional support, including official accommodations. If you have already been approved for accommodations, please have DAR provide a letter during the first week of classes, or as soon as possible after approval. I will then contact you to schedule a meeting during which we can discuss the particular implementation of your accommodations.

Academic Integrity

Students are allowed and encouraged to collaborate on most in-class and homework assignments. However, any work that you turn in for grading must be your own. You are welcome to use other paper or internet resources to supplement content we cover in this course, with the exception of solutions to homework problems. Copying solutions from the internet or other sources is an Honor Principle violation. Exams will explicitly mention what resources may be consulted. All written work that references material outside of the textbook or lecture should be accompanied by an appropriate citation.

Code of Conduct

I expect all members of Math 243 to make participation a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation.

I expect everyone to act and interact in ways that contribute to an open, welcoming diverse, inclusive, and healthy community of learners. Examples of unacceptable behavior include: using sexualized language or imagery, making insulting or derogatory comments, harassing someone publicly or privately, or other unprofessional conduct.

Instead you can contribute to a positive learning environment by demonstrating empathy and kindness, being respectful of differing viewpoints and experiences, and giving and gracefully accepting constructive feedback.

Assignment Feedback

You will receive timely feedback on your homework via GitHub, either from me or the course grader (another mathematics undergraduate). Each homework problem can earn up to five points for statistical content, and two points for the quality of writing and clarity of code. You are strongly encouraged to review comments on your solutions and rework missed problems. You are welcome to post questions about past homework problems on our Slack channel and to talk to me about them during office hours.

Help

I strongly encourage to attend my office hours each week. You are welcome to come either with specific questions, or just with general uncertainties about content we’ve discussed. If you are unable to attend scheduled office hours, please message me on Slack to schedule an alternative appointment (either in-person or virtual).

Our course assistant (a Reed undergraduate) will also be holding several help sessions each week to provide assistance with and facilitate collaboration on homework.

Finally, every Reed student is entitled to one hour of free individual tutoring per week. Use the tutoring app in IRIS to schedule meetings with a student tutor.


Tentative Schedule

This is the schedule as of Day 1. A more up-to-date schedule can be found here.

Week Sections Covered Week Sections Covered
1 Foundations of Stat Learning 8 Tree-Based Models
2 Linear Models 9 Tree-Based Models
3 Resampling Methods 10 Support Vector Machines
4 Model Selection 11 Unsupervised Learning (Exam 2)
5 Beyond Linear Models 12 Unsupervised Learning
6 Classification (Exam 2) 13 Special Topics
7 Classification 14 Reading Period
Fall Break 15 Presentations