While the intellectuals keep saying “it’s not a race to be productive”, for those interested in data analytics, data science or anything related to data, I thought let’s make a list of top 5 data science projects to do during your spare time, these are in no particular order.
1. Speech Emotion Recognition
Of the activities humans can do, a lot is governed by speech and the emotions attached to a scene, a product or experience.
SER, an acronym for Speech Emotion Recognition ca be a compelling Data Science project to do this summer. It attempts to perceive human emotions from the speech (voice samples). Moreover, for sighting human emotion, different sound files are used as the dataset. SER essentially focuses on feature extraction to extract emotion from audio recordings.
While working on the project in Python, you would also shelf up knowledge on the package Librosa, used for analyzing music and audio.
Vox Celebrity Dataset can be a good starting point to perform Speech Emotion Recognition.
The purpose of predictive analytics is to make predictions about unknown events of the future.
It encompasses a variety of statistical techniques from predictive modeling, machine learning, and data mining, analyze current and historical facts to identify risks and opportunities.
Loan Prediction Data: Predict if a loan will get approved or not
Forecasting HVAC needs: Combine weather forecast with building system
Customer Relationship Management
Clinical decision support systems
Customer and Employee Retention: churn rates
Project Risk Management
3. Regression Analysis
The purpose of regression analysis is to predict an outcome based on a historical data.
Regression analysis is a robust statistical test that allows examination of the relationship between two or more variables of interest. While there are many types of regression analysis, at the core, all examine the influence of one or more independent variables on a target (dependent) variable.
Walmart sales data: Predict the sales of a store
Boston housing data: Predict the median value of owner-occupied homes
Wine Quality prediction: Predict the quality of the wine
Black Friday Sales prediction : Predict purchase amount for a household
Algorithms to be used:
Depends on the nature of target variable: numeric or categorical
CART — Factor target
Decision Trees — Factor target
Linear Regression — Numeric target
Logistic Regression — Factor target
4. Customer Segmentation
Customer Segmentation is the process of splitting a customer base into multiple groups of individuals that share a similarity in ways a product is or can be marketed to them such as gender, age, interests, demographics, economic status, geography, behavioral patterns, spending habits and much more..
Customer Segmentation is one the most important applications of unsupervised learning. Using clustering techniques, companies can identify the several segments of customers allowing them to target the potential user base.
Companies use the clustering process to foresee or map customer segments with similar behavior to identify and target potential user base.
Algorithms to be used:
K-means clustering, heirarchical clustering are the top clustering methods. Some of the other clustering algorithms are:
Furthermore, once the data is collected, companies can gain a deeper understanding of customer preferences and requirements for discovering valuable segments that would reap them maximum profit. This way, they can be strategic their marketing techniques more efficiently and minimize the possibility of risk to their investment.
5. Exploratory Data Analysis
Exploratory Data Analysis (EDA) is actually the first step in a data analysis process. Here, you make sense of the data you have, figure out what questions you want to ask, how to frame them, best manipulate it to get the answers needed.
EDA exposes a broad look of patterns, trends, outliers, unexpected results and so on in existing data using visual and quantitative methods. There are tons of projects that can be done with Exploratory Data Analysis. Here I’ve listed for reference or as a good starting point.