Instructor: Jonathan “Nate” Wells
Email: wellsj@reed.edu
Classroom: Library 389
Office: Library 392
In-person Office Hours: M 3-4pm, W 10-11am, F 10-11am
Virtual Office Hours: T 2-3pm
Zoom Link: https://zoom.us/my/wellsj392
This course is an overview of modern approaches to analyzing large and complex data sets that arise in a variety of fields from biology to marketing to astrophysics. The most important modeling and predictive techniques will be covered, including regression, classification, clustering, resampling, and tree-based methods. This course will make extensive use of the R programming language.
Prerequisites: MATH141, or Instructor Consent.
his course can be used towards your Group III, “Natural, Mathematical, and Psychological Science,” requirement. It accomplishes the following learning goals for the group:
This course does not satisfy the “primary data collection and analysis” requirement.
Students do not need to purchase paper copies of texts.
The following web-based resources will be used for communicating class information:
Slack https://reedmath243fall2021.slack.com (announcements, discussions, direct messaging).
Course Website https://Reed-Stat-Learning-Fall-2021.github.io (documents, a daily schedule, assignments).
GitHub Classroom https://classroom.github.com/classrooms/88862064-reed-stat-learning-fall-2021-classroom (homework submission).
You are encouraged to bring a personal computer to class each day for notetaking and live coding. Access to a computer with webbrowser will be required for homework completion and submission. Computing & Information Services offers programs for long-term laptop loan: https://www.reed.edu/cis/facilities/student-technology-equipment-program.html
We will make very frequent use of the R programming language to create statistical models, run simulations, and implement stat learning algorithms. All homework will be completed using the RStudio IDE. R and RStudio are free to use, and can either be installed locally on your computer, or can be accessed using the Reed RStudio Server: https://rstudio.reed.edu/
Throughout the term, we will use GitHub to manage and submit assignments. GitHub is a hosting service to house Git-based projects online, and is designed to assist with version control and collaboration on big projects. https://github.com/
If you would like to contact me, I can most easily be reached via Slack message weekdays between 8am and 6pm. While I try to answer messages as soon as possible, in some cases, I may not be able to respond until the following school day. If you’d prefer to talk live, send me a message and we can schedule a time to chat on zoom.
By the end of the course, a student should be able to:
Articulate and compare the different philosophical approaches to prediction, statistical inference, classification, and clustering.
Create valid statistical models, perform data analysis using software, and communicate results in non-technical language using reproducible methods in order to answer a particular research question.
Implement simulation and randomization algorithms in order to demonstrate and assess properties of statistical models.
Assess and compare the performance of a variety of statistical models, and select appropriate models according to suitable criteria.
Apply statistical learning techniques to real-world data and problems.
Justify and describe properties of particular statistical learning methods by appealing to mathematical theory.
A typical class day will involve the following:
A prepared student will attend class for 50 minutes per day, three days each week, and spend about two to three hours per day of class on work outside the classroom (reading, doing homework, working on projects, discussing, studying, etc.). Together, this represents a 9 - 12 hour per week commitment.
Your grade in the class will be determined by your proficiency in each of the Course Outcomes, as demonstrated in the following assessments:
Statistical knowledge takes time to develop, and understanding deepens upon revisiting a concept a \(2^\text{nd}\), \(3^\text{rd}\), or \(n^\text{th}\) time. Studying basic terminology and elementary examples in the textbook before class means that class can be spent clarifying and expanding ideas, rather than introducing them. Daily reading assignments will be posted on our course webpage, and will list the specific section(s) to read for each day, along with a response question to be completed by 8am each day of class (to give me time to review them before class). Responses should be posted in the #daily-reading
channel of our Slack workspace. The questions are not intended to be overly difficult, but should help both you and I highlight topics that need further review. Responses will be assessed primarily on the basis of completion. No extensions on daily readings will be given, but up to three assignments may be missed without penalty.
A weekly problem set will be made available after class on Wednesday, due by 5pm on the following Wednesday. Some time on Monday or Friday of each week will be devoted to collaborative coding components of the assignment. Problem sets must be completed as a .rmd file in RStudio and submitted via GitHub. Detailed submission instructions can be found on the course webpage. Up to two times throughout the term, you may request up to a 5 day extension on your assignment. Except in extraordinary circumstances, requests must be made prior to an assignment’s due date.
It is important to attend each class when able and to actively participate via notetaking and discussion. If you are unable to attend class for any reason, please notify me promptly so that we can make appropriate arrangements for you to make-up missed work. Frequent absences for which make-up work is not completed will be reflected in your final course grade.
Please do not come to class if you are ill, or if you’ve had close exposure to someone else who has been ill.
Two take-home exams will be given during the term, and will be made available on a Friday, to be completed before class the following Monday. Tentatively, the first is scheduled for Friday, October 8th (Week 6) and the second for Friday, November 19 (Week 11). The exams are intended to take between 3 and 4 hours to complete.
Throughout the second half of the term, you will work in groups of 3 or 4 on a project that answers a significant research question using real-world data, by implementing the fundamental techniques developed in our class, as well as some more advanced methods from supplementary sources. The project will culminate in a 20 minute presentation during finals week and a 5-10 page reproducible technical report.
Reed College is committed to creating inclusive and accommodating learning environments. Please notify me as soon as possible if there are aspects of the instruction or design of this course that result in barriers to your participation. I also encourage you to contact Disability & Accessibility Resources (DAR) at https://www.reed.edu/disability-resources/ for additional support, including official accommodations. If you have already been approved for accommodations, please have DAR provide a letter during the first week of classes, or as soon as possible after approval. I will then contact you to schedule a meeting during which we can discuss the particular implementation of your accommodations.
Students are allowed and encouraged to collaborate on most in-class and homework assignments. However, any work that you turn in for grading must be your own. You are welcome to use other paper or internet resources to supplement content we cover in this course, with the exception of solutions to homework problems. Copying solutions from the internet or other sources is an Honor Principle violation. Exams will explicitly mention what resources may be consulted. All written work that references material outside of the textbook or lecture should be accompanied by an appropriate citation.
I expect all members of Math 243 to make participation a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation.
I expect everyone to act and interact in ways that contribute to an open, welcoming diverse, inclusive, and healthy community of learners. Examples of unacceptable behavior include: using sexualized language or imagery, making insulting or derogatory comments, harassing someone publicly or privately, or other unprofessional conduct.
Instead you can contribute to a positive learning environment by demonstrating empathy and kindness, being respectful of differing viewpoints and experiences, and giving and gracefully accepting constructive feedback.
You will receive timely feedback on your homework via GitHub, either from me or the course grader (another mathematics undergraduate). Each homework problem can earn up to five points for statistical content, and two points for the quality of writing and clarity of code. You are strongly encouraged to review comments on your solutions and rework missed problems. You are welcome to post questions about past homework problems on our Slack channel and to talk to me about them during office hours.
I strongly encourage to attend my office hours each week. You are welcome to come either with specific questions, or just with general uncertainties about content we’ve discussed. If you are unable to attend scheduled office hours, please message me on Slack to schedule an alternative appointment (either in-person or virtual).
Our course assistant (a Reed undergraduate) will also be holding several help sessions each week to provide assistance with and facilitate collaboration on homework.
Finally, every Reed student is entitled to one hour of free individual tutoring per week. Use the tutoring app in IRIS to schedule meetings with a student tutor.
This is the schedule as of Day 1. A more up-to-date schedule can be found here.
Week | Sections Covered | Week | Sections Covered |
---|---|---|---|
1 | Foundations of Stat Learning | 8 | Tree-Based Models |
2 | Linear Models | 9 | Tree-Based Models |
3 | Resampling Methods | 10 | Support Vector Machines |
4 | Model Selection | 11 | Unsupervised Learning (Exam 2) |
5 | Beyond Linear Models | 12 | Unsupervised Learning |
6 | Classification (Exam 2) | 13 | Special Topics |
7 | Classification | 14 | Reading Period |
Fall Break | 15 | Presentations |