Predicting Learner Engagement
The primary objective of this project is to employ supervised machine learning techniques to predict the engagement level of learners with educational videos. The engagement is defined by the percentage of the video watched by users, with a threshold set at 30%. The project aims to understand which features extracted from the video, such as transcript, audio track, hosting site, and others, contribute to learner engagement. The complete project is available at https://github.com/Yossranour1996/Predictive-Modeling/tree/main.
Dataset:
The dataset, extracted from the VLE Dataset compiled by researcher Sahan Bulathwela at University College London, comprises training and test datasets (train.csv and test.csv). Each row in these files represents a single educational video, and features include title word count, document entropy, freshness, easiness, fraction of stopword presence, speaker speed, and silent period rate. The target variable is engagement, indicating whether the median percentage of the video watched is at least 30%.
Approach:
The project employs a supervised machine learning approach to build predictive models for learner engagement. Various features derived from the rich set of resources connected to the original data are explored. The manageable dataset size facilitates exploration on diverse computing platforms, and predictive modeling techniques are applied to understand which features contribute to the success of educational videos with viewers.
Evaluation:
The performance of the predictive model is evaluated using the Area Under the ROC Curve (AUC) metric. The focus is not only on accuracy but also on understanding which features significantly contribute to the overall model performance.
Dataset Features:
title_word_count
document_entropy
freshness
easiness
fraction_stopword_presence
speaker_speed
silent_period_rate
Deliverables:
The project delivers a function that trains a model to predict learner engagement using the provided training set (train.csv). Using this model, the function returns a Pandas Series with the probability that each corresponding video from the test set (test.csv) will be engaging. The output is indexed by video IDs.
This project demonstrates the practical application of machine learning in the context of online education, providing valuable insights into the factors influencing learner engagement with educational videos.
Skills:
#Scikitlearn #Suport Vector Machines #Machine learning #Data science