Standards in this Framework
Standard | Description |
---|---|
1.1.1 | Identify the key stages of a data science project lifecycle. |
1.1.2 | Identify key roles and their responsibilities in a data science team (e.g., business stakeholders, define objectives; data engineers, build pipelines; data scientist, develop models; domain experts, provide expertise). |
1.1.3 | Define and create project goals and deliverables (e.g., problem statements, success metrics, expected outcomes, final reports, summary presentations). |
1.1.4 | Create and manage project timelines (e.g., milestones, deadlines, task dependencies, resources allocation). |
1.1.5 | Create a student portfolio including completed data science projects, reports, and other student-driven accomplishments. |
1.2.1 | Collaborate in team-based projects (e.g., team discussions, maintaining project logs, following protocols, code review, documentation). |
1.2.2 | Communicate technical findings to non-technical audiences (e.g., data visualizations, present key-insights, explaining complex concepts). |
1.2.3 | Make data-driven decisions and recommendations by proposing solutions and evaluating alternatives. |
1.3.1 | Identify ethical considerations in data collection, storage and usage (e.g., data privacy, bias, transparency, consent). |
1.3.2 | Demonstrate responsible data handling practices (e.g., protecting sensitive information, citing data sources, maintaining data integrity). |
1.3.3 | Report results responsibly (e.g., addressing limitations, acknowledging uncertainties, prevent misinterpretation). |
2.1.1 | Differentiate between discrete and continuous probability distributions. |
2.1.2 | Calculate probabilities using discrete distributions (e.g. Uniform, Binomial, Poisson). |
2.1.3 | Calculate probabilities using continuous distributions (e.g. Uniform, Normal, Student, Exponential). |
2.1.4 | Apply Bayes’ Theorem to calculate posterior probabilities. |
2.2.1 | Calculate p-values using a programming library and interpret the significance of the results. |
2.2.2 | Perform hypothesis testing. |
2.2.3 | Identify and explain Type I and Type II Errors (e.g., false-positives, false-negatives). |
2.2.4 | Calculate and interpret confidence intervals. |
2.2.5 | Design and analyze experiments to compare outcomes (e.g., identifying control/treatment groups, selecting sample sizes, determining variables, implementing A/B tests). |
2.3.1 | Perform basic matrix operations including addition, subtraction and scalar multiplication. |
2.3.2 | Calculate dot products and interpret their geometric meaning. |
2.3.3 | Apply matrix transformations to data sets. |
2.3.4 | Compute and interpret distances between vectors. |
3.1.1 | Create and manipulate (e.g., sort, filter, aggregate, reshape, merge, extract, clean, transform, subset) one-dimensional data structures for computation analysis (e.g lists, arrays, series). |
3.1.2 | Create and manipulate (e.g., transpose, join, slice, pivot, reshape) two-dimensional data structures for organizing structured datasets. (e.g. matrices, dataframes). |
3.1.3 | Utilize operations (e.g., arithmetic, aggregations, transformations) across data structures based on analytical needs. |
3.1.4 | Apply indexing methods to select and filter data based on position, labels, and conditions. |
3.2.1 | Import data into a DataFrame from common spreadsheets formats (e.g., csv, xlsx). |
3.2.2 | Import data into a DataFrame directly from a database (e.g., using SQLalchemy library). |
3.2.3 | Import data into a DataFrame using web scraping libraries (e.g. Beautiful Soup, Selenium). |
3.2.4 | Import data into a DataFrame leveraging API requests (e.g., Requests, urllib). |
3.3.1 | Convert between data types as needed for analysis (e.g., strings to numeric values, dates to timestamps, categorical to numeric encoding). |
3.3.2 | Convert between structures as needed for analysis (e.g., lists to arrays, arrays to data frames). |
3.3.3 | Standardize and clean text data (e.g., remove whitespace, correct typos, standardize formats). |
3.3.4 | Identify and remove duplicate or irrelevant rows/records. |
3.3.5 | Restructure columns/fields for analysis (e.g., splitting, combining, renaming, removing irrelevant data). |
3.3.6 | Apply masking operations to filter and select data. |
3.3.7 | Handle missing and invalid data values using appropriate methods (e.g., removal, imputation, interpolation). |
3.3.8 | Identify and handle outliers using statistical methods. |
3.4.1 | Examine data structures using preview and summary methods (e.g., head, info, shape, describe). |
3.4.2 | Create new data frames by merging or joining two data frames. |
3.4.3 | Sort and group records based on conditions and/or attributes. |
3.4.4 | Create functions to synthesize features from existing variables (e.g., mathematical operations, scaling, normalization). |
4.1.1 | Generate histograms and density plots to display data distributions. |
4.1.2 | Create box plots and violin plots to show data spread and quartiles. |
4.1.3 | Construct Q-Q plots to assess data normality. |
4.2.1 | Generate scatter plots and pair plots to show relationships between variables. |
4.2.2 | Generate correlation heatmaps to display feature relationships. |
4.2.3 | Plot decision boundaries to visualize data separations. |
4.3.1 | Generate bar charts and line plots to compare categorical data. |
4.3.2 | Create heat maps to display confusion matrices and tabular comparisons. |
4.3.3 | Plot ROC curves and precision-recall curves to evaluate classifications. |
4.4.1 | Generate line plots to show trends over time. |
4.4.2 | Create residual plots to analyze prediction errors. |
4.4.3 | Plot moving averages and trend lines. |
4.5.1 | Draw conclusions by interpreting statistical measures (e.g., p-values, confidence intervals, hypothesis test results). |
4.5.2 | Evaluate model performance using appropriate metrics and visualizations (e.g., R-squared, confusion matrix, residual plots). |
4.5.3 | Identify patterns, trends, and relationships in data visualizations (e.g., correlation strength, outliers, clusters). |
4.5.4 | Draw actionable insights from analysis results. |
5.1.1 | Describe the key characteristics of Big Data (e.g., Volume, Velocity, Variety, Veracity). |
5.1.2 | Identify real-world applications of Big Data across industries (e.g., healthcare, finance, retail, social media). |
5.1.3 | Analyze case studies of successful and unsuccessful Big Data implementations across industries (e.g., recommendation systems, fraud detection, predictive maintenance). |
5.1.4 | Identify common Big Data platforms and tools (e.g., Hadoop for distributed storage, Spark for data processing, Tableau for visualization, MongoDB for unstructured data). |
5.2.1 | Describe how organizations store structured and unstructured data. |
5.2.2 | Compare different types of data storage systems (e.g., data warehouse, data lakes, databases). |
6.1.1 | Contrast supervised and unsupervised learning. |
6.1.2 | Differentiate between classification and regression problems. |
6.1.3 | Evaluate model performance using appropriate metrics (e.g. Accuracy, Precision/Recall, Mean Squared Error, R-squared). |
6.2.1 | Perform linear regression for prediction problems. |
6.2.2 | Perform multiple regression for prediction problems. |
6.2.3 | Perform logistic regression for classification tasks. |
6.2.4 | Implement Naive Bayes Classification using probability concepts. |
6.2.5 | Perform k-means clustering using distance metrics. |
6.3.1 | Apply standard methods to split data into training and testing sets. |
6.3.2 | Apply cross-validation techniques (e.g. k-fold, leave-one-out, stratified k-fold). |
6.3.3 | Identify and address overfitting/underfitting. |
6.3.4 | Select appropriate models based on data characteristics and problem requirements. |