Sentiment Analysis on Movie Reviews
This project focuses on performing sentiment analysis using the NLTK VADER sentiment analysis tool on a dataset of movie reviews. The dataset comprises 2,000 records from the IMDb movie review database and is provided in a tab-delimited file for analysis. The project is available at https://github.com/Yossranour1996/Sentiment-Analysis.
Dataset:
The movie reviews dataset is loaded into a Pandas DataFrame from the "moviereviews.tsv" file. Each record in the dataset contains a movie review along with its sentiment label.
Techniques Used:
Sentiment Analysis with NLTK VADER:
The NLTK library is utilized for sentiment analysis, specifically using the VADER (Valence Aware Dictionary and sEntiment Reasoner) tool. This tool is designed for social media text and provides a compound sentiment score.
Data Preprocessing:
The movie reviews are processed to extract sentiment scores using the SentimentIntensityAnalyzer from NLTK. The compound score is then used to categorize each review as positive or negative.
Implementation Steps:
Data Loading:
The movie reviews dataset is loaded into a Pandas DataFrame from the provided tab-delimited file.
Sentiment Analysis:
NLTK VADER sentiment analysis is applied to each movie review in the dataset. The compound sentiment score is calculated, and reviews are categorized as 'pos' (positive) or 'neg' (negative) based on the score.
Classification and Evaluation:
The sentiment classification results are compared with the actual sentiment labels to assess the accuracy of the sentiment analysis. Standard classification metrics such as accuracy, classification report, and confusion matrix are computed.
Outcome:
The project results in a sentiment analysis model capable of categorizing movie reviews into positive or negative sentiments. The evaluation metrics provide insights into the accuracy and performance of the sentiment analysis on the movie reviews dataset.
Skills:
#NLTK #Machine learning #Calssification