首页 如何通过数据做出决策-数据科学基础 课程简介

 

Introduction to Statistics and Applications within Data Science

 

Basic information

 

 

Course Title :        Introduction to Statistics and Applications within Data Science

 

 

Instructor

Manuel González Canché , Associate Professor Quantitative   Methods and Policy Analysis, University of Pennsylvania, USA.

 

 

Prerequisites

No  specific  preparation  is  required  before  the  start  of  the programme, although students will be encouraged to familiarize themselves with basic  concepts  about  data  science  analytic tools.

Required Text & Tools

Textbook:

Wickham, H., & Grolemund, G. (2016). R for data science: import, tidy,

transform,   visualize,   and   model   data.   "   O'Reilly   Media,   Inc."

https://r4ds.had.co.nz/index.html

 

Silge, J., & Robinson, D. (2017). Welcome to text mining with R.

https://www.tidytextmining.com/

 

Beck, M. W. (2018). NeuralNetTools: Visualization and Analysis Tools  for  Neural  Networks.  Journal  of Statistical  Software, 85(11), 1–20.https://doi.org/10.18637/jss.v085.i11

 

Hvitfeldt, E., & Silge, J. (2022). Supervised machine learningfor text analysis in R. Chapman and Hall/CRC.https://smltar.com/

 

Tools:

Student will use R (or Python) Statistical Software .

Grading Criteria

Class participation and attendance 60%

Two exercises 20%

Final Exam 20%

Course Key

Words

Data science, Statistical modeling, Data analytics, data mining, Data Retrieval, Advanced data visualizations

 

Schedule

 

Session Topic Beijing         Time
Day 1 Introduction: Relevanceof data science 6/26/2023 19:20-20:55
Day 2 Introduction to R 6/27/2023 19:20-20:55
Day 3 Data visualizationand processingChapter 3 Wickham & Grolemundhttps://r4ds.had.co.nz/data-visualisation.html  6/28/2023 19:20-20:55
Day 4 Feature engineeringChapter 3 Wickham & Grolemundhttps://r4ds.had.co.nz/transform.html 6/29/2023 19:20-20:55
Day 5 Text MiningNatural Language ProcessingChapter 1 Silge & Robinsonhttps://www.tidytextmining.com/tidytext.html 7/3/2023 19:20-20:55
Day 6 Machine learning andunsupervised learningChapter 6 Silge & Robinsonhttps://www.tidytextmining.com/topicmodeling. html 7/4/2023 19:20-20:55
Day 7 Supervised learningChapter 7 Hvitfeldt & Silgehttps://smltar.com/mlclassification.html#classfir stattemptlookatdata 7/5/2023 19:20-20:55
Day 8 Deep Learning Neural networks Beck 2018https://www.jstatsoft.org/article/view/v085i11 7/6/2023 19:20-20:55

 

 

Course description

 

This course represents an introduction to Statistics and Data Science Applications. This is a highly applied course where we will devote time each week to understand each of the topics to be discussed and then proceed to showcase how to implement the analyses in R. All data and code will be provided to class participants and our class discussions will highlight best practices employed by data scientists and academics alike.               

Objectives

 

1. This course will cover detailed explanations and examples of statistical modeling with data science and visualization.

2. This  course  will  provide  students  an  opportunity  to  develop  research  skills including different methods for collecting, analyzing, and exploring data.

3. This course will engage students to apply their learning and knowledge into actual practices and applications in order to accomplish the learning goals and research proposals that apply the concepts and methods discussed in this course.

4. It is expected that participants will become well positioned to be competitive applicants in undergraduate or graduate programs where data science tools are valued and highly sought. They can also seek employment as data scientists.