Today, the improvement of information technology have changed the ways of communication, making it easier for customers to access the information and exchange ideas about products or services on a large scale in real time. Social networks and online review websites (such as Agoda, TripAdvisor, Yelp, Amazon, etc.) allow customers to give their opinions on products or services through reviews . With the explosion of big data, it is necessary to collect and exploit automatically their online reviews so that business enterprises can easily understand customer purchase behavior, as well as their interests and satisfaction level on product or service quality. Opinion mining has become the subject of studies in different areas: market research, e-business, political polls . Currently, the community of scientists have lots of studies on opinion mining methods as well as the application of opinion mining at different levels.
This study was conducted to apply supervised machine learning methods in opinion mining online customer reviews. First, the study automatically collected 39,976 traveler reviews on hotels in Vietnam on Agoda.com website, then conducted the training with machine learning models to find out which model is most compatible with the training dataset and apply this model to forecast opinions for the collected dataset. The results showed that Logistic Regression (LR), Support Vector Machines (SVM) and Neural Network (NN) methods have the best performance in opinion mining in Vietnamese language.
According to , , opinion mining, also known as sentiment analysis, is a field of research aimed at analyzing and assessing people’s perceptions of objects such as products, services, organizations, individuals, events, topics and their attributes. Opinion mining uses classification techniques and synthesizes opinion documents based on 3 perspectives: positive, negative, neutral. Opinion mining is divided into 3 levels: (i) document level in which we assume each document represents opinions on a single entity. Therefore, the analysis will not be applicable to documents that cover many subjects; (ii) sentence level in which we assume each sentence represents opinions about one object.; and (iii) entity level in which opinions are considered according to the target instead of linguistic structure (documents, sentences, clauses, etc). The targets can be the object or aspects (attributes) of the object.
However, the analysis will ignore sentences that have multiple clauses which express opinions about different objects.
This study is to review studies on opinion mining and propose the application of machine learning method in opinion mining customer reviews in Vietnamese. The method of knowledge discovery in databases is applied to this study in which 39,976 tourists’reviews on hotels in Vietnam are collected through Agoda.com. Then, the study conducts data pre-processingand training using machine learning methods to find the most suitable model with the training data sets and apply this model to forecast opinions for the entire dataset.
Logistic Regression (LR), Support Vector Machines (SVM) and Neural Network
- Operating system : Windows.
- Coding Language : Python.
System : Pentium IV 2.4 GHz or intel
Hard Disk : 40 GB.
Floppy Drive : 1.44 Mb.
Mouse : Optical Mouse.
Ram : 512 Mb.
This study has conducted a theoretical background on opinion mining methods, opinion classification techniques and proposed the application of supervised machine learning method for automatic opinion mining. Experimental results show that LR, SVM and NN are the best among the training methods. This study is valuable as a reference for applications of opinion mining in socioeconomic fields. However, this study still has some limits that can be adjusted in future studies. Firstly, in terms of data collection, this study only collects customer reviews about hotels on Agoda.com. The study may expand to collect reviews about any products or services on ecommerce websites or social networks. Secondly, in terms of the scale, this study only classifies customer reviews on a 2-level scale: positive and negative