Indonesian Sentiment Analysis towards MyPertamina Application Reviews by Utilizing Machine Learning Algorithms

This paper is a report of experiment analysis on sentiment analysis in application review that explored the methods and the data. Application review contains a large amount of raw data that has been published by users in the form of text, image, audio, and video. The data can be converted into valuable information by using sentiment analysis. In this work, around 5000 Indonesian review in MyPertamina google play application are analyzed. The goal of this study was to investigate the effectiveness of using sentiment analysis to extract valuable insights from application reviews. Some techniques were applied during this work, such as data collection, pre-processing, feature extraction, TF-IDF text representation, machine learning modelling, and evaluation phase. The machine learning algorithms that we used are Linear Support Vector Classification (Linear SVC) and Multinomial Naïve Bayes (Multinomial NB). The result shows both machine learning models present good performance in this data. The accuracy of Multinomial NB reaches 95%, while Linear SVC obtains 96% of accuracy. The results of the experiment suggest that both Linear SVC and Multinomial NB are well-suited for sentiment analysis tasks on Indonesian language data. Future work could include expanding the dataset to include reviews from a broader range of applications, or evaluating the performance of additional machine learning algorithms. In addition, word cloud analysis also performed in this experiment. The word cloud shows that positive and negative sentiment present some popular words which appear inside the review. It would also be interesting to conduct a deeper analysis of the word cloud results to identify common themes and trends in the positive and negative sentiments expressed in the reviews.


I. INTRODUCTION
In the last decade, information and technology have redefined social norms.Mobile applications can be able to change our daily live and behavior [1].There are a lot of applications in internet, it can be desktop applications, web-based applications, and mobile applications.However, mobile applications look simple, compact, and easy to access since smartphones has already become a part of our live.Using this technology, everything is simple.Simply by finding the appropriate apps and reading through user reviews and ratings.
In other views, as a business owner or product owner, listening to customer feedback is important to improve the experience of mobile apps [2].Recently, rely on application review rating are not enough, comment from reviews also play a vital role to understand the user's perspective.Using sentiment analysis approaches, we can identify and determine if review indicates a positive, negative, or neutral emotion.It leads in understanding customer feedback and mostly used by several companies to analyze application review.It also helps to find out the underlying sentiment in a text.
Natural language processing (NLP) theoretically defines as computational procedures for analyzing and reflecting naturally occurring texts, at one or more levels of linguistic analysis for the purpose of achieving human-like language processing for a range of tasks or applications [3].One of the main goals of NLP is to allow computers to understand, interpret, and generate human language, which can be challenging due to the complexity and variability of natural language.In addition, NLP can be called as a tool with which a document can be processed to find positive, negative, or neutral sentiments.It can be useful in identifying trend and user's sentiments towards a product or services.As a result, business objective can be modified to address customer's concerns.There is a lot of application to ease the end user, such as MyPertamina.MyPertamina is a platform for online financial services provided by Pertamina Indonesia.At Pertamina's public fueling facilities, this application is used to process non-cash fuel oil payments [4].In the context of this research, NLP was used to analyze the sentiment of reviews left by MyPertamina users in order to identify common complaints and areas for improvement.However, many MyPertamina users complain about the application.The purpose of this research is to analyze and elaborate the application review from end user perspective.By understanding the sentiments of users, the development team behind MyPertamina can make informed decisions about how to improve the application and better meet the needs of its users.The results of the NLP analysis may also be useful for marketing and customer service teams, as they can use the insights gained to address user concerns and improve the overall user experience.
The contribution of this paper includes (1) use two machine learning algorithms for sentiment analysis on 5000 MyPertamina Indonesian reviews, (2) use text representation namely Term Frequencyinverse Document Frequency (TF-IDF), (3) the supervised machine learning algorithms such as Multinomial Naïve Bayes and Linear Support Vector Classification was implemented and evaluated using accuracy as performance criteria, (4) word cloud data visualization also implemented and analyzed to extract the most popular word inside the application review.
Furthermore, this paper is organized into five sections.Section 2 demonstrates previous related work or literature review related to implementation of sentiment analysis on Indonesian application review.Section 3 introduces experimental research methodology.Section 4 depicts the experiment result and discussion.The study's final statements and recommendations for further work are included in Section 5.

II. LITERATURE REVIEW
There are some previous works related to sentiment analysis, application review, machine learning, and big data.A comprehensive analysis of twitter data using machine learning has been discussed by Kawade [5].The data that has been collected is 5000 tweets.The result shows that even limited character such as twitter data can be used for analyzing the content of the user's perspective.Research by Handani [6], tried to analyze Go-Jek application review.Go-Jek is an online transportation services application.In their work, Naïve Bayes Classifier has been used to build data model.The result shows this algorithm can be used for text mining especially for analyzing application review.A study for analyzing mobile telecommunication service namely by.U has been done by Fransiska [7].This research uses google play scrapper library.From the collected reviews, they labelled the data manually, for score 1 and 2 converted to negative sentiment, while score 4 and 5 converted to positive sentiment.Based on their work, score 3 was not included to the research since it has less informative.The document (reviews) has been transformed into vector by using TF-IDF technique.The result shows that the TF-IDF and Support Vector Machine (SVM) methods can be applied to the classification process with good measurement results.However, further analyze according to Indonesia application review need to be done.This research tries to analyze the MyPertamina application review using word cloud method to know the most popular word in positive and negative score.In addition, this research also compares two machine learning which powerful algorithms according to sentiment analysis such as Linear SVC and Multinomial NB.III.RESEARCH METHOD Fig. 1. presents the experimental design.The experiment divided into some parts such as data collection, preprocessing phase, feature extraction, word evaluation using TF-IDF, build two classification models based on training data and evaluation phase.However, inside the pre-processing phase, there are some techniques to be applied to the data such as data normalization, case folding, data filtering, character repetition removal, tokenization, and stop words removal.The detail of every phase will be elaborated in the next section.

A. Data Collection and Analysis
In this part, some application reviews were collected.In this phase, we obtained 5000 application reviews.The library that we used is google play scrapper.From this work we collected data based on its attributes such as review id, username, user image, content, score, thumb up count, review created version, published time, reply content, and replied at.However, in this work, we only use content and score as parameter since it contains more information to classify the review.The sample of the reviews is shown in Table I.The graph of the data and its score is shown in Fig. 2. It shows that 93% of data are labelled as score 1, while the least score is 4 (0.4%).Majority of end user perspective about MyPertamina gives bad review.The score 1 and score 5 have significant differences.The detail of the score is sorted by its frequency, the data is shown in Table II  In this work, we only use two attributes such as content and score.We use these two attributes since the other attributes give less information about the review.The sample of the data is shown in Table III.However, in this work, we only use review which has score 1 and 5.After obtaining score 1 and 5 we convert those values into 0 and 1 respectively.Score 0 contains 4650 reviews and score 1 contains 256 reviews.

B. Pre-processing 1) Data Normalization:
After obtaining data, the next step is data normalization.The purpose of this phase is to normalize the review since some reviews contains slang words or clean the data from the noise [8].This work has been done using Indonesian colloquial words collection [9].The result is shown in Table IV.

2) Case Folding:
All letters are transformed into lowercase [10].The sample of the text before and after case folding is shown in Table V.

3) Data Filtering:
The purpose of this step is taking important word from less important words [11].In other words, only saving important words and discarding less important words.The sample of filtering result is shown in Table VI.

4) Character Repetition Removal:
In this step, some repetition have been removed from the sentence.The sample of the character removal phase is shown in Table VII.

5) Tokenization:
Tokenization is a step which separating a piece of text into smaller unit called token.The tokens contain words, symbol, punctuation marks, numbers, and other important entities [12].The detail reviews after tokenization phase are shown in Table VIII.

6) Stop Word Removal:
The purpose of this stage is to exclude words that appear frequently across all of the corpus's documents [13].In this work, we use Indonesian corpus provided by Natural Language Toolkit (NLTK) library.Stop word removal result is shown in Table IX.

C. Feature Extraction and Data Weighting
Feature extraction technique is applied to the data.One of the quickest and most effective text mining techniques is the TF-IDF approach [14].Output of this step is extracted features and its weight and ready to be trained by machine learning algorithms.

D. Machine Learning Modelling
Machine learning algorithms that we used are Linear Support Vector Classification (Linear SVC) and Multinomial Naïve Bayes (Multinomial NB).In this work we only use default parameters from scikitlearn without any adjustments.Output of this step is two classification models namely Linear SVC and Multinomial NB model.

E. Evaluation
Evaluation step aim is to measure how accurate the model prediction.Output this step is percentage of accuracy for every classification model (Linear SVC and Multinomial NB).The equation of the accuracy is shown in Eq. 1.And the confusion matrix is shown in Table X.

IV. RESULTS AND DISCUSSION
In this section, extensive experiment results are reported.For instance, the bag of words sample is shown in Table XI.

A. Naïve Bayes for Sentiment Analysis
The Naive Bayes equation for sentiment analysis is typically used to predict the probability that a given text document belongs to a particular class or category, such as "positive" or "negative" sentiment.First, calculate the prior probability for each class.The sample are shown in E.q. 2 and E.q. 3.
Calculate the likelihood of each feature (i.e., aplikasi, bagus, bensin, etc.) and repeat for all words.This can be done by counting the number of occurrences of each feature in each class, and dividing by the total number of observations in that class.Laplace smoothing is a method that is used to prevent zero probabilities when training a Naive Bayes model.This is done by adding a small constant, often 1, to each count.The sample are shown in E.q. 4 and E.q.5.
Finally, use these prior probabilities and likelihoods to classify new observations by calculating the posterior probability for each class.In this sample, we test "aplikasi error" for sample.Based on E.q. 6 and E.q.7, the class with the highest probability will be the predicted class for the new observation.

B. Support Vector Classification for Sentiment Analysis
In sentiment analysis, SVM can be used to separate text documents into different sentiment classes, such as "positive" and "negative.".In this example, we are using the sample sentence "aplikasi error" to test the sample data provided.The first step would be to tokenize the sentence, which means breaking it up into individual words.In this case, the sentence "aplikasi error" would be tokenized into the words "aplikasi" and "error".Then we would need to create a vector with the same number of columns as the training data, where each column represents a word.For each word in the sentence, we would check if it appears in the training data and set the corresponding value in the vector to 1 if it does, and 0 if it doesn't.

C. Word Cloud Analysis
Word cloud is one of data visualization method which based on word frequency [15][16].Word cloud result is shown in Fig. 3. and Fig. 4. From these figures we can conclude that in positive reviews, there are a lot of praises related to MyPertamina apps.However, based on negative word cloud shows that some aspects that make end user gives less score in their review.

D. Popular Word
Popular word is shown in Fig. 5. and Fig. 6., the sample of popular words in positive sentiment are "bagus", "mantap", "membantu".On the other hand, the most popular words in negative sentiment are "ribet", "susah", "bug".Based on this result, the positive review says about their satisfaction while using the MyPertamina application, while the negative review says that some aspect can be improved for the related stakeholder.The popular words can be more specific in order to fasten the next version of the application and make the application more robust.respectively.Multinomial NB shows that false negative frequency is 50, while true negative frequency is 932.Furthermore, in Linear SVC result shows that there are 36 data labelled as false negative and 14 data are detected as true positive.In addition, for Linear SVC performance demonstrates that there are 928 are true positive and 4 data detected as false positive.Overall evaluation for these two machine learning models is the accuracy of Multinomial NB is 95% and Support Vector Classification is 96%.The model that has been built using Linear SVC is slightly higher than Multinomial NB.The calculation is shown in E.q. 10 and E.