Social

Sentiment Comparison between #IELTS and #TOEFL


By Rachel Yixin Yan 3035586235

IELTS and TOEFL are two of the best known tests for English learners, especially for those who want to study or settle down in English speaking countries. And now that it is a compulsory requirement while applying for majors or emigration, together with the roaring population schools in English speaking countries, more and more people are taking these two tests. This assignment, based on Twitter, is aimed to see people’s preference between these two tests, and at the same time, to see specifically people’s sentiment towards each.

Using Twitter API, first of all, I separately grabbed 500 tweets of containing “#IELTS” and “#TOEFL” hashtag, and match words people used in their tweets with positive and negative keywords defined in the database, in order to illustrate each proportion of the positive and negative attitude.

Results show that on the one hand, Twitter users hold both positive attitudes towards IELTS and TOEFL, given that both proportions of positive words are much higher than that of negative words. Based on that, we can see people highly recognize and accept the authority of these two language tests.

In detail, on the other hand, we can see that the positive proportion of #IELTS is higher than #TOEFL, that is, 0.84 versus 0.79, meaning that Twitter users may prefer IELTS to TOEFL. Reasons can be found that TOEFL focuses more on academic knowledge than IELTS which also engages life-related useful things. Also IELTS is more flexible in test setting, and is recognized by more countries than TOEFL.

To know about the concrete types of emotions Twitter users hold towards these two tests, I did a second analysis by collecting 1000 tweets of each test hashtag, and defining 10 types of emotions: positive, trust, anticipation, joy, negative, sadness, disgust, anger, surprised and fear.

As a whole, results show that emotion ranks are similar, given that the top three highest mentioned emotions are both “positive”, “trust”, and “anticipation”, while “joy”’s mentions are both less than “negative”, but higher than “sadness”. Then it can be seen that as a whole, people hold positive attitude towards the two tests, with much trust and anticipation, though the test experience might not be joyful enough.

Differences can be noticed by a detailed look. For one thing, the total mentions of positive emotions on #IELTS is larger than that on #TOEFL. For example, the prominent “positive” on #IELTS shows 1,517 mentions, nearly 400 more than from #TOFEL analysis. For another, in terms of negative emotions, there might be more people feeling disgusted and less angry about TOEFL than IELTS, a very interesting point that may attribute to the fact that TOEFL test is more academic, and thus more difficult to people, while IELTS’ low scoring in writing and speaking recently annoyed many people.

In conclusion, IELTS and TOEFL are English language tests that have both won supports and trusts from people, while IELTS seems to be slightly more preferred for its practicability in knowledge, flexibility in test settings and wider range of recognition from global countries.

R source code:

#% of Sentiment Towards #IELTS and #TOEFL on Twitter

if (!require(“rtweet”)) install.packages(“rtweet”, repos=”https://cran.cnr.berkeley.edu/“, dependencies = TRUE)
if (!require(“tidyverse”)) install.packages(“tidyverse”, repos=”https://cran.cnr.berkeley.edu/“, dependencies = TRUE)
if (!require(“tidytext”)) install.packages(“tidytext”, repos=”https://cran.cnr.berkeley.edu/“, dependencies = TRUE)
if (!require(“stringr”)) install.packages(“stringr”, repos=”https://cran.cnr.berkeley.edu/“, dependencies = TRUE)
if (!require(“SnowballC”)) install.packages(“SnowballC”, repos=”https://cran.cnr.berkeley.edu/“, dependencies = TRUE)

require(tidyverse) 
require(tidytext) 
require(rtweet) 
require(stringr) 
require(plotly) 
require(SnowballC)

sentiment_term <- get_sentiments(“bing”)
head(sentiment_term)

search_term <- “#IELTS”

IELTS <- search_tweets(search_term, n=500, include_rts = FALSE, lang=”en”)

search_term <- “#TOEFL”

TOEFL <- search_tweets(search_term, n=500, include_rts = FALSE, lang=”en”)

IELTS_words <- strsplit(IELTS$text,’ ‘)
IELTS_words_sent <- sapply(IELTS_words,function(z){
 sentiment_term$sentiment[sentiment_term$word %in% z]
})
table(unlist(IELTS_words_sent))

TOEFL_words <- strsplit(TOEFL$text,’ ‘)
table(unlist(sapply(TOEFL_words,function(z){
 sentiment_term$sentiment[sentiment_term$word %in% z]
})))

sentiment_term$word <- wordStem(sentiment_term$word,”english”)
sentiment_term <- sentiment_term[!duplicated(sentiment_term$word),] # remove duplicated terms

IELTS_tb <- table(unlist(sapply(IELTS_words,function(z){
 sentiment_term$sentiment[sentiment_term$word %in% wordStem(z,”english”)]
})))

TOEFL_tb <- table(unlist(sapply(TOEFL_words,function(z){
 sentiment_term$sentiment[sentiment_term$word %in% wordStem(z,”english”)]}
)))

prop.table(IELTS_tb)
prop.table(TOEFL_tb)

prop_IELTS <- prop.table(IELTS_tb)
prop_TOEFL <- prop.table(TOEFL_tb)
p <- plot_ly(x = names(prop_IELTS), y = prop_IELTS, name = “#IELTS”, type = ‘bar’)
p <- add_trace(p, x = names(prop_TOEFL), y = prop_TOEFL, name = “#TOEFL”, type = ‘bar’)
layout(p, title = “% of Sentiment Towards #IELTS and #TOEFL on Twitter”, xaxis = list (title = “Name of The Test”), yaxis = list (title = “Proportion”))

Sys.setenv(“plotly_username”=”RachelYan”)
Sys.setenv(“plotly_api_key”=”OTYJPjCkY0mYn0Leruwk”)
api_create(p, filename = “% of Sentiment Towards #IELTS and #TOEFL on Twitter”)

#Sentiment Type for Hashtag #IELTS

library(twitteR)
library(RCurl)
library(httr)
library(tm)
library(wordcloud)
library(syuzhet)

if (!require(“rtweet”)) install.packages(“rtweet”, repos=”https://cran.cnr.berkeley.edu/“, dependencies = TRUE)
if (!require(“httpuv”)) install.packages(“httpuv”, repos=”https://cran.cnr.berkeley.edu/“, dependencies = TRUE)

require(“rtweet”) 
require(“httpuv”)
require(“plotly”)

tweets_IELTS <- search_tweets(“#IELTS”, n = 1000, lang = “en”)
tweets.df_IELTS <-as.data.frame(tweets_IELTS)

tweets.df_IELTS$text <- gsub(“&amp”, “”, tweets.df_IELTS$text)
tweets.df_IELTS$text <- gsub(“&amp”, “”, tweets.df_IELTS$text)
tweets.df_IELTS$text <- gsub(“(RT|via)((?:bW*@w+)+)”, “”, tweets.df_IELTS$text)
tweets.df_IELTS$text <- gsub(“@w+”, “”, tweets.df_IELTS$text)
tweets.df_IELTS$text <- gsub(“[[:punct:]]”, “”, tweets.df_IELTS$text)
tweets.df_IELTS$text <- gsub(“[[:digit:]]”, “”, tweets.df_IELTS$text)
tweets.df_IELTS$text <- gsub(“httpw+”, “”, tweets.df_IELTS$text)
tweets.df_IELTS$text <- gsub(“[ t]{2,}”, “”, tweets.df_IELTS$text)
tweets.df_IELTS$text <- gsub(“^s+|s+$”, “”, tweets.df_IELTS$text)

tweets.df_IELTS$text <- iconv(tweets.df_IELTS$text, “UTF-8”, “ASCII”, sub=””)

emotions_IELTS <- get_nrc_sentiment(tweets.df_IELTS$tex)
emo_bar_IELTS <- colSums(emotions_IELTS)
emo_sum_IELTS <- data.frame(count=emo_bar_IELTS, emotion=names(emo_bar_IELTS))
emo_sum_IELTS$emotion <- factor(emo_sum_IELTS$emotion, levels=emo_sum_IELTS$emotion[order(emo_sum_IELTS$count, decreasing = TRUE)])

p <- plot_ly(emo_sum_IELTS, x=~Emotion, y=~Count, type=”bar”, color=~emotion) %>%
layout(xaxis=list(title=””), showlegend=FALSE, title=”Sentiment Type for Hashtag #IELTS”)

Sys.setenv(“plotly_username”=”RachelYan”)
Sys.setenv(“plotly_api_key”=”OTYJPjCkY0mYn0Leruwk”)
api_create(p, filename = “Sentiment Type for Hashtag #IELTS”)

#Sentiment Type for Hashtag #TOEFL

tweets_TOEFL <- search_tweets(“#TOEFL”, n = 1000, lang = “en”)
tweets.df_TOEFL <-as.data.frame(tweets_TOEFL)

tweets.df_TOEFL$text <- gsub(“&amp”, “”, tweets.df_TOEFL$text)
tweets.df_TOEFL$text <- gsub(“&amp”, “”, tweets.df_TOEFL$text)
tweets.df_TOEFL$text <- gsub(“(RT|via)((?:bW*@w+)+)”, “”, tweets.df_TOEFL$text)
tweets.df_TOEFL$text <- gsub(“@w+”, “”, tweets.df_TOEFL$text)
tweets.df_TOEFL$text <- gsub(“[[:punct:]]”, “”, tweets.df_TOEFL$text)
tweets.df_TOEFL$text <- gsub(“[[:digit:]]”, “”, tweets.df_TOEFL$text)
tweets.df_TOEFL$text <- gsub(“httpw+”, “”, tweets.df_TOEFL$text)
tweets.df_TOEFL$text <- gsub(“[ t]{2,}”, “”, tweets.df_TOEFL$text)
tweets.df_TOEFL$text <- gsub(“^s+|s+$”, “”, tweets.df_TOEFL$text)

tweets.df_TOEFL$text <- iconv(tweets.df_TOEFL$text, “UTF-8”, “ASCII”, sub=””)

emotions_TOEFL <- get_nrc_sentiment(tweets.df_TOEFL$tex)
emo_bar_TOEFL <- colSums(emotions_TOEFL)
emo_sum_TOEFL <- data.frame(count=emo_bar_TOEFL, emotion=names(emo_bar_TOEFL))
emo_sum_TOEFL$emotion <- factor(emo_sum_TOEFL$emotion, levels=emo_sum_TOEFL$emotion[order(emo_sum_TOEFL$count, decreasing = TRUE)])

p <- plot_ly(emo_sum_TOEFL, x=~Emotion, y=~Count, type=”bar”, color=~emotion) %>%
layout(xaxis=list(title=””), showlegend=FALSE, title=”Sentiment Type for Hashtag #TOEFL”)

Sys.setenv(“plotly_username”=”RachelYan”)
Sys.setenv(“plotly_api_key”=”OTYJPjCkY0mYn0Leruwk”)
api_create(p, filename = “Sentiment Type for Hashtag #TOEFL”)



Source link

Show More

Leave a Reply

Pin It on Pinterest

Share This

Share this post with your friends!