Top 5 Data Science Projects for a Beginner

While the intellectuals keep saying “it’s not a race to be productive”, for those interested in data analytics, data science or anything related to data, I thought let’s make a list of top 5 data science projects to do during your spare time, these are in no particular order.

1. Speech Emotion Recognition

Of the activities humans can do, a lot is governed by speech and the emotions attached to a scene, a product or experience.

SER, an acronym for Speech Emotion Recognition ca be a compelling Data Science project to do this summer. It attempts to perceive human emotions from the speech (voice samples). Moreover, for sighting human emotion, different sound files are used as the dataset. SER essentially focuses on feature extraction to extract emotion from audio recordings.

While working on the project in Python, you would also shelf up knowledge on the package Librosa, used for analyzing music and audio.

Vox Celebrity Dataset can be a good starting point to perform Speech Emotion Recognition.

Algorithms to be used:

  1. Convolutional Neural Network (CNN)
  2. Recurrent neural networks (RNN)
  3. Neural Network (NN)
  4. Gaussian mixture model (GMM)
  5. Support Vector Machine (SVM)

Speech Emotion Recognition Topics on GitHub

2. Predictive Analytics

The purpose of predictive analytics is to make predictions about unknown events of the future.

It encompasses a variety of statistical techniques from predictive modeling, machine learning, and data mining, analyze current and historical facts to identify risks and opportunities.


  1. Loan Prediction Data: Predict if a loan will get approved or not
  2. Forecasting HVAC needs: Combine weather forecast with building system
  3. Customer Relationship Management
  4. Clinical decision support systems
  5. Customer and Employee Retention: churn rates
  6. Project Risk Management

3. Regression Analysis

The purpose of regression analysis is to predict an outcome based on a historical data.

Regression analysis is a robust statistical test that allows examination of the relationship between two or more variables of interest. While there are many types of regression analysis, at the core, all examine the influence of one or more independent variables on a target (dependent) variable.


  1. Walmart sales data: Predict the sales of a store
  2. Boston housing data: Predict the median value of owner-occupied homes
  3. Wine Quality prediction: Predict the quality of the wine
  4. Black Friday Sales prediction : Predict purchase amount for a household

Algorithms to be used:

Depends on the nature of target variable: numeric or categorical

  1. CART — Factor target
  2. Decision Trees — Factor target
  3. Linear Regression — Numeric target
  4. Logistic Regression — Factor target

4. Customer Segmentation

Customer Segmentation is the process of splitting a customer base into multiple groups of individuals that share a similarity in ways a product is or can be marketed to them such as gender, age, interests, demographics, economic status, geography, behavioral patterns, spending habits and much more..

Customer Segmentation is one the most important applications of unsupervised learning. Using clustering techniques, companies can identify the several segments of customers allowing them to target the potential user base.

Companies use the clustering process to foresee or map customer segments with similar behavior to identify and target potential user base.

Algorithms to be used:

K-means clustering, heirarchical clustering are the top clustering methods. Some of the other clustering algorithms are:

  1. Partitioning method
  2. Fuzzy clustering
  3. Density-based clustering
  4. Model-based clustering

Furthermore, once the data is collected, companies can gain a deeper understanding of customer preferences and requirements for discovering valuable segments that would reap them maximum profit. This way, they can be strategic their marketing techniques more efficiently and minimize the possibility of risk to their investment.

5. Exploratory Data Analysis

Exploratory Data Analysis (EDA) is actually the first step in a data analysis process. Here, you make sense of the data you have, figure out what questions you want to ask, how to frame them, best manipulate it to get the answers needed.

EDA exposes a broad look of patterns, trends, outliers, unexpected results and so on in existing data using visual and quantitative methods. There are tons of projects that can be done with Exploratory Data Analysis. Here I’ve listed for reference or as a good starting point.


  1. Global Suicide Rates (dataset)
  2. Summer Olympic Models (dataset)
  3. World Happiness Report (dataset)
  4. Nutrition Facts for McDonald’s Menu (dataset)

Thank you for reading! I hope you enjoyed the article. Do let me know what projects are you looking forward to learning or doing over the summer in your Data Science journey?

More Stories
Simple Explanation: What is Data Visualization ?