Workshop in Data Science

About this course

Welcome to DSC96 at UCSD! This class is titled "Workshop in Data Science", and is an optional, 2-unit project course in data science. It is meant to be taken concurrently with DSC10, and has no prerequisites.

This class is about using data to answer questions. This is in contrast with most of your other classes this year, which are about fundamentals and theoretical underpinnings. The questions you get to answer are the big important ones, including “What happened?”, “Why did it happen?”, and “What will happen?”. The data used to answer the questions will range from real-world government data to tweets about UC San Diego to sound recordings.

By the time you finish this class, you will be able to:

  • Identify problems that are good candidates for data science,

  • Re-frame the problem in a way that can be answered with the available data,

  • Evaluate the limitations and quirks of the data,

  • Manipulate the data to answer relevant questions, and

  • Communicate the results clearly.

COVID-19 Notes

First, your top priority right now should be making sure you stay safe and healthy. Your second priority should be making sure your family, friends, and community are safe and healthy. I hope that together we can learn some data science too, but never at the cost of health or safety.

This is the first time this class has been adapted for online learning. It is going to be a bit rough, and we will all make mistakes. I encourage you to be generous and forgiving, but also to give clear and frequent feedback. I will try to do the same.

Third, I want to recognize that this is going to be hard for you. There are students in this class that are all over the world, that don't have access to good computers, reliable internet, and who are suddenly working in less-than-optimal conditions. There is also the stress of living during a global pandemic, social unrest and a major election! Personally, I know I am not doing my best work right now because I am distracted and worried. Please reach out if there is anything I can do to help you in this class. I want you all to succeed.

Here are a few of the changes for this quarter:

  • All instruction, discussion, etc. will be remote, using Zoom. There will be no in-person anything.

  • There will be no attendance based grading. You can do this class totally asynchronously (though I encourage live attendance!).

Important Links

Instructor: Colin Jemmott, cjemmott@ucsd.edu

The required book for this class is “Confident Data Skills” by Kirill Eremenko. The book is meant for a wide audience, is very recently published, and is inexpensive. We will only be reading chapters 1-5, after which the book goes into techniques that will be covered in more detail in your other classes.

Much of the readings for this course are taken from popular publications that do a good job of communicating arguments using data. Links are on the schedule page: https://www.dsc96.com/schedule

How Class Works

This class will be completely remote, and it is possible to complete everything asynchronously. I strongly encourage attending class if you can - doing that will make the class easier and more fun for you (and me!).

We will be using a flipped classroom model, meaning that you will watch a lecture or do some reading before class, and the part you usually think of as "homework" will be done together in class.

Before Class

Each time we meet there will be activities you need to complete before class. Typically that will be watching a lecture to prepare for our in-class activity, or doing some reading to prepare for a discussion. The schedule is here: https://www.dsc96.com/schedule

You should be prepared to discuss the reading or lecture. One way to do this is to keep a running list of ideas and questions. Some prompts that might be helpful include:

  • Relating a topic in the reading to something in your life or in the news,

  • Asking a thoughtful question about the reading, or

  • Picking a quote from the reading that you agree or disagree with and explaining why.

During Class

The majority of class time will be doing activities together - usually making plots or writing code.

Joining the zoom call with a microphone will allow you to participate in class discussion and ask questions more easily, and I recommend it. I also recommend you have access to a computer with a keyboard during class - it is hard to write code on a tablet or phone. I also like it when people are able to join with video (especially for the discussion parts). But It is also totally fine if you are unable or unwilling to do those things - we will make it work!

Classes will not be recorded, mainly because I want people to be comfortable speaking openly.

Instead of Class

I understand that some of you will not attend a few or even all of the classes. That's ok! You don't need to let me know or get permission, though I am available if you want to talk.

If you can't make it to class, you are still responsible for making a good effort at the in-class work: discussion and exercise.

How to make up in-class discussion

If you miss class, you should also email me about the reading or lecture. A short note is fine - the goal is to have a discussion like you would with a colleague. If you are stuck on what to say, look at the prompts in the "Before Class" section above.

How to make up in-class exercise

If you miss class, you should spend an hour or so at the exercise, making a good effort.

Without the collaboration and explanations that happen in class, this will be much more difficult, so I recommend finding a buddy to work with and using the slack channel to ask lots of questions. Questions might not be answered immediately, so if you get stuck it is probably better to wait for an answer rather than getting frustrated.

Important note: You do not need to completely finish the activities! Many of the in-class activities are designed to be much longer than can be completed in a reasonable time - I like to include the extra material in case someone wants to learn more.

In-class assignments from a missed class will not be accepted more than a week late unless you ask for and receive special permission. Only some assignments are collected - others will just be on the honor system for you to make a good effort at. I will note the ones that get turned in on the schedule.


The activities in this class are collaborative. You are encouraged to ask for help from the instructor or from other students. This means that for this class it is totally ok to show your work to other students and discuss it openly.

However, even in this collaborative environment, the work you do must be your own. Specifically, you must do the actual work of completing the assignment (i.e. typing out the code, moving the mouse) and understand what your code or analysis is doing.

Please ask for clarification if that distinction is not clear. This is important to me.


Below is a formal grading rubric, but here is the honest truth: this is an optional pass/no pass class. If you make a serious effort to do the class, you will pass. I am only going to really calculate grades if you really don't do the class. That said, there is a wait list, so if you are planning on really not putting effort in, I would ask you to drop the class to make room for someone else.

Class discussion on reading = 20%

SDPD traffic stops project = 20%

Unstructured data project = 30%

Audio classification project = 30%

Final Grade

70%-100% = Pass

0%-69% = No Pass

Academic Integrity

For this class, the key to academic integrity is accurately representing the status and authorship of your work. I strongly encourage you to read the official UCSD policy on integrity of scholarship.

Diversity and Inclusion

I am committed to an inclusive learning environment that respects our diversity of perspectives, experiences and identities. You, as a student in this course, are also responsible for maintaining an environment where your fellow students feel safe and respected.

In my opinion, the key to this is recognizing the inherent worth and dignity of every person. If there is a way you could feel more included please let me know via email.

Accommodations for Students with Disabilities

Students requesting accommodations for this course due to a disability or current functional limitation must provide a current Authorization for Accommodation (AFA) letter issued by the Office for Students with Disabilities (OSD), which is located in University Center 202 behind Center Hall. If you have an AFA letter, please make arrangements to meet with the instructor and with the Data Science OSD Liason by the end of Week 2 to ensure that reasonable accommodations for the quarter can be arranged. The Data Science OSD Liaison can be reached at dscstudent@ucsd.edu and is located in Atkinson Hall #2010.

Additional Information