Many of the world’s greatest discoveries, most consequential decisions, and personal or collective endeavors are informed by the analysis of data from a myriad of sources. While Williams has no formal academic program in data science, there are many ways to integrate the subject into your studies. Below are a few illustrative pathways for data science—each tailored to achieve a particular outcome.

Why Study Data Science?

The ability to collect and contextualize data will benefit you whether you’re pursuing the sciences, social sciences, humanities, business, medicine, politics, leadership, or beyond. Data collection and visualization, statistical inference, and machine learning are critical tools to the physical and natural sciences as well.

Data Science Across Disciplines

The interdisciplinary nature of data science is reflected in the many ways that faculty and students from across the college engage with data in their work. Here is just a small sample of some of those connections.

  • Statistics contributes to data science by providing the foundational theories and methodologies for data collection, analysis, and interpretation—all essential aspects of making data-driven decisions. And designing experiments and surveys ensures the data gathered is representative and reliable.

    The field has also developed methods crucial for understanding data distributions, making inferences about populations based on sample data and identifying patterns and trends. Research in statistics continuously enhances the toolkit available to data scientists, offering robust methods for dealing with uncertainty, improving predictive models, and developing new ways to handle complex, high-dimensional data. Additionally, statistical approaches are integral in validating machine learning models, ensuring their accuracy and understanding their limitations.

  • Computer science contributes to the computational aspects of data science by developing the computer hardware, algorithms and software tools that enable scalable and efficient data processing and analysis. Computer science has also contributed to modern methods in machine learning and artificial intelligence (AI), which enable learning patterns from data in flexible and inductive ways.

    The Computer science faculty’s research 1) Expands the set of tools and methods available to data scientists—including improved programming and software engineering tools, more accurate causal inference, and better generalization performance of machine learning and natural language processing algorithms—and 2) Applies computation to study large-scale real-world datasets across many collaborating disciplines.

Quantitative social science is at the core of many data science applications, particularly those focused on answering causal questions about social policy, politics, and human welfare.

  • Quantitative social science is at the core of many data science applications, particularly those focused on answering causal questions about social policy, politics, and human welfare. Economics provides a unique set of tools for answering causal questions empirically. Economics also provides domain expertise relevant to many of the most widely used sources of data, including data on wages and income, employment and labor supply, market transactions and trade, public spending and government decision-making, education, health, savings and credit, and time use. Williams’ economics faculty also have specific methodological expertise in policy evaluation and causal inference, survey design and data collection, randomized experiments, and the development of new sources of data.

  • Psychologists use the tools and strategies of data science in all phases of research including study design, data acquisition and analysis, and data visualization. With advances in smartphones and wearable sensors, some researchers can collect vast datasets. Others examine massive datasets generated by social media and other online sources. Analyzing such data relies on cutting-edge data science techniques, including machine learning models.

  • One of the fundamental goals of neuroscience research is to uncover the causal structure of the brain. The major approaches employed in identifying causal structures are to record and manipulate the activation of brain cells, and to monitor their effects on behavior or physiology. While this is easily stated, doing it is less trivial. For example, the human brain consists of about 100 billion neurons and about 100 trillion synaptic connections. Typical recordings of brain activity generate terabytes of data, and data science is needed to wrangle, analyze, and interpret this information in relation to behavior, as well as internal and external states.

  • In the public health domain, data science enables the analysis and prediction of health trends, disease outbreaks and population-level health risks. Through epidemiological analysis and predictive modeling, data science helps track disease patterns, forecast outbreaks, and optimize responses. It also uncovers health disparities across demographics, guiding policies that promote equity. By analyzing large datasets, data science assists in resource allocation, ensuring that interventions and healthcare services are targeted where they are needed most. Data science is also crucial for evaluating the effectiveness of public health policies and programs, which drives evidence-based decision-making. The integration of big data, including electronic health records, allows for more comprehensive insights into public health. Finally, data visualization helps communicate complex health information to the public, improving awareness and engagement.

Data collection and visualization, statistical inference, and machine learning are critical tools to the physical and natural sciences as well.

  • Geosciences use data science in virtually all sub-disciplines. For example, geochemical databases of volcanic and plutonic rocks help discriminate between different tectonic environments in which the rocks formed. The recognition of mass extinctions was driven by pioneers who crafted databases of species extant at different times in geologic history.

    Climate science is also data-driven as we reconstruct past climate and attempt to project our climate future. Permanent Global Navigation Satellite System installations are able to track plate motions, but the amount of data generated requires careful processing and analysis. Geographic Information Systems and Remote Sensing are increasingly important methods for understanding our environment, and both require meaningful approaches of data science.

  • Data science techniques are essential for analyzing genomic data in bioinformatics as well as algorithms and statistical models for large-scale DNA sequences that can be used to identify genes linked to diseases and understand genetic variations.

  • Data science and machine learning facilitates computational modeling to understand complex chemical reactions, predict the properties of new materials and explore reaction pathways. Physics and astronomy analyze enormous datasets, including those related to detecting rare particle interactions and to understanding astronomical phenomena.

Faculty in the humanities also increasingly use data science for a range of applications.

  • Literary scholars use topic modeling and other tools to analyze large corpora of such sources as poems, novels and letters, whose volume precludes reading by a single individual. These forms of “distant reading” can supplement and deepen the “close reading” that has long been the practice in the field. Network analysis showing the ties between disparate literary figures has also become an important aspect of some forms of literary history in recent years.

  • In addition to literary tools, many historians increasingly make use of Geographic Information Systems (GIS) to analyze historical trends geographically, be they patterns in trade or the places of origin of government officials.

Advances in data science not only contribute scholarship and teaching in the arts and humanities, but in themselves are the subject of study.

  • Science and Technology Studies (STS) explores historical, social, cultural, ethical and political dimensions of science and technology, providing a lens through which to examine how data science influences and is influenced by societal values, policies and practices.

  • Philosophy examines foundational questions about knowledge, ethics and reasoning, including ethical implications of data collection and usage and the nature of algorithmic decision-making. With the rise of artificial intelligence, philosophical inquiry into AI ethics becomes crucial, examining issues such as algorithmic bias, the moral responsibilities of AI creators, the impact of AI on human autonomy and the broader societal consequences of deploying AI technologies.

Students working together on laptops in a classroom

Pathways for Studying Data Science

While Williams has no formal academic program in data science, there are many ways to integrate the subject into your studies. Below are a few illustrative pathways for data science, each tailored to achieve a particular outcome.

  • If you’re curious about the subject but do not intend to continue studies in a data-driven discipline, we encourage you to take one or two classes exposing you to the core topics and central themes of data science.

    If you have with little or no computing or statistics background, we recommend starting with an introductory course covering either the central concepts of data science or a detailed look at either the statistical or computing aspects of data science.

     

    One or two of:

    CSCI 104
    Data Science and Computing for All
    CSCI 134
    Introduction to Computer Science
    INTR 150
    Data for Justice (also STS, AMST, SOC, WGSS 150)
    GEOS 290
    Data Analysis in Earth Science (also ENVI 290)
    POEC 253
    Empirical Methods in Political Economy
    STAT 101
    Elementary Statistics and Data Analysis
    STAT 161
    Introductory Statistics for Social Science
    STAT 201
    Statistics and Data Analysis

    If you have some experience in computer science, CSCI 136 or a complementary statistics class may be a better starting point. Similarly, those with some statistics experience may wish to explore STAT 202 or complement that experience with a class on the computing side. Those with stronger backgrounds may wish to dive into a class that examines data in a different context, such as economics.

     

    More advanced alternatives:

    CSCI 136
    Data Structures and Advanced Programming
    ECON 255
    Econometrics
    STAT 202
    Introduction to Statistical Modeling
  • If you wish to apply data science techniques to problems in your chosen primary field of study, we encourage you to develop a plan to complete classes and projects that align with your interests and emphasize data analysis. One such pathway emphasizes a strong foundation in computer science, mathematics, and statistics, and a complementary collection of classes aligned with your particular interests, so that you may apply data science techniques within your chosen domain.

    Here is one possible foundation:

     

    Data science foundation

    CSCI 134
    Introduction to Computer Science
    CSCI 136
    Data Structures and Advanced Programming
    STAT 201
    Statistics and Data Analysis
    STAT 300+
    Any upper-level offering

    Several example areas that focus on either a single discipline or concentration or a collection of related classes across multiple disciplines:

     

    Economics focus (2-3)

    ECON 255
    Econometrics
    ECON 370
    Data Science for Economic Analysis
    ECON 371
    Time Series Econometrics and Empirical Methods for Macro
    ECON 379
    Program Evaluation for International Development
    ECON 474
    Advanced Methods for Casual Inference
    ECON 460
    Women, Work, and the World Economy from 5,000 BC to the Present
    ECON 462
    Topics in African Development
    ECON 524
    Advanced Microeconometrics
    ECON 571
    Global Health Policy Challenges

     

    Neuroscience focus (2-3)

    PSYC 201
    Experimentation and Statistics
    PSYC 212
    Neuroscience (also BIOL 212, NSCI 201)
    NSCI 322
    From Order to Disorder(s): The Role of Genes & the Environment in Psychopathology
    NSCI 324
    Neuroethology (also BIOL 324)
    NSCI 337
    Neural Flexibility: plasticity, modulation and evolution (also BIOL 437)

     

    Life sciences focus (2-3)

    PSYC 212
    Neuroscience (also BIOL 212, NSCI 201)
    NSCI 322
    From Order to Disorder(s): The Role of Genes & the Environment in Psychopathology
    BIOL 319
    Integrative Bioinformatics, Genomics, and Proteomics Lab (also Chem, CSCI, PHYS, MATH 319)
    CHEM 368
    Computational Chemistry and Molecular Spectroscopy
    PHYS 315
    Computational Biology (also CSCI 315)

    The path you follow will be highly dependent on your interests, but in general, we encourage you to identify two or three classes or projects exposing you to intermediate or advanced topics within a domain and also providing the opportunity for quantitative study of those topics.

  • If you view data science as a central component of your future career—whether in commercial, government, nonprofit or academic pursuits—you may wish to pursue a course of study that emphasizes statistics, computation and data science, along with additional courses providing a broad foundation for working in other domains. Your exact course of study will depend on what you are most interested in pursuing.

    If your interests lie primarily in statistics and the mathematical foundations of data science, an excellent option would be to major in statistics while gaining sufficient depth in computing to manipulate complex data and build sophisticated models, as well as to understand the key benefits and limitations of different inference techniques. 

    • That depth could be obtained, for example, by taking courses like CSCI 134, 136, and either CSCI 256 or one of CSCI 374, 375, or 381. 
    • We also recommend taking several other classes that examine domain-focused applications of data science.

    If you are more interested in the computational aspects of data science, a more appropriate track may be to major in computer science while also diving deeply into statistics through STAT 201 plus two or three additional statistics classes. Again, be sure to take classes in other domains as well.

    Endeavors in computational social science could begin via an economics major coupled with commensurate depth in statistics and computing as in the previous two paths. Of course, there are no hard and fast rules.

    Complementing a degree in any STEM or social sciences field with a strong background in the technical aspects of data science will position you well for further studies in your area of interest or for pursuing careers deeply connected to gathering, interpreting, and communicating about data.