A study of #MeToo movement responses on Twitter

9 min readOct 14, 2018

The report is about the recent tweets of the #MeToo movement in New York City. Tweets are categorized into three groups — positive, negative, and neutral — to monitor and analyze social phenomena and to visualize the general public opinions. The key findings pertaining to this social movement are discussed below.

Background:

The first use of the phrase ‘Me Too’, a well-known worldwide movement against sexual harassment, can be dated back to 2006. It was first used by Tarana Burke, a social activist and community organizer in the Bronx, New York City, and then followed by the use of the hashtag #Metoo in 2017 popularized by actress Alyssa Milano from Brooklyn, New York City, in order to encourage victims of sexual harassment to stand up and tweet their stories (Khomami, 2017).

This hashtag has spread wildly not only in the country but around the world, promoting the change of legislation on sexual harassment and attracting people’s awareness of the victims’ sufferings. However, there are always opponents to such radical social movements. Some men fear becoming the targets of sexual harassment allegations; at the same time, women also show their anxiety about being treated unequally in male-dominated fields. For example, in Florida, some female staff in the Capitol expressed their concerns about the collateral damage caused by the movement that the male legislators refused to meet with them even for work in private (Klas, 2017), resulting in further exclusion in the workplace. Therefore, it is meaningful to research different aspects of public opinion regarding this movement.

Tweets preprocessing and cleaning:

Due to the limited rate and authorization of Tweeter API (there is a 15-minute window-based rate limit for 180 API calls and I can only access the recent 7-day tweets), I narrow my search to New York City as the research city, the day that I run the code as the research day to show the most recent response and ‘tweepy’ as the stream method.

After several hours of running the stream code, 16272 tweets were extracted from a posterior analysis. Although I have used track statement to filter the full-text mode twitter stream with ‘MeToo movement, some of the tweets do not have ‘Metoo’ in the full text, potentially because they have shared a website related to the movement or because the name of the movement is too common in daily communication. Since I cannot solve the problem, for now, I re-filter the text of each tweet by keyword ’MeToo’, resulting in 1943 tweets that fulfill the requirements. Then I clean the tweets by removing the ‘@user’, punctuations, special characters, short words (eg: I’m, to, the, … ) and the keywords ‘metoo’ and ‘movement’.

Story Generation

In this section, I explore the text of the cleaned tweet to picture the public opinions towards #MeToo. Plotting word clouds should help us understand the given distributions in the tweets dataset. Natural language processing(NLP) can categorize the whole dataset into three different communities: positive, negative, and neutral. The following questions will be answered in this section:

a. What are the most common words in the entire dataset?

In the word cloud, we can see the most frequent common nouns are ‘women’, ‘victim’, and ‘shame’, which are related to sexual harassment and the biggest victim group.

The most frequent proper nouns are ‘India’ and ‘Bollywood’, referring to the latest Bollywood ‘MeToo’ event in which Indian superstar Akshay Kumar chose to cancel the shoot of ‘Housefull 4’ following sexual misconduct allegations against the film’s director (Bhushan, 2018). This was regarded as the spread of the ‘Metoo’ movement in a late but influential way.

The most frequent verb is ‘support’, showing the supportive of the public both for the victim and women.

Interestingly, ‘Melania Trump’ has been frequently mentioned in the tweets which will be illustrated in the third part.

b. What’s the polarity distribution of the tweets?

TextBlob library dives into natural language processing (NLP) to analyze the polarity of the tweets. From the pie chart, we can see that most of the tweets related to ‘Metoo’ are neutral. Positive and negative tweets share the other 50% of the tweet population. The number of positive tweets is twice the number of negative tweets.

By sorting the tweets by polarity, we can find that the most positive ones (with a polarity score of 1) of the tweets share the articles about the movement and ask people to pay attention to the phenomenon. Take the tweet below as an example:

The most negative ones (with a polarity score of -1) of the tweets are sharing articles about the movement showing sympathy and allegations about the movement. Take the tweet below as an example:

3. What are the most common words in the dataset for negative and positive tweets?

There are some common words shared in both negative and positive tweets, such as ‘India’, ‘women’, and ‘people’.

In the upper image, there are positive words such as ‘support’, ‘love’, ‘great’, and ‘thanks’ which have been frequently used. ‘MGTOW’ has been mentioned in the tweets which refers to the movement ‘Men Going Their Own Way’ supported by social media warning men against serious romantic relationships with women, especially marriage (MGTOW, n.d.). However, after analyzing the tweets related to ‘MGTOW’, I just found some tweets considering the potential ‘MGTOW’ after the ‘MeToo movement’. The user @AbhishekIyenga3 who has tweeted the most in this tag has been suspended because of the violation of the Twitter Rules[1]. From my perspective, this kind of mistake might be because of the drawbacks of natural language processing (NLP), which cannot differentiate sarcasm through keyword analysis.

In the lower image, there are negative words such as ‘sorry’, ‘hard’, and ’wrong’. The different race has been mentioned in the negative category, while ‘black’ has not been mentioned in the positive tweets. Moreover, echoing what was mentioned before, I find that ‘Melania Trump’ has been mentioned frequently in negative tweets. According to CNN news on October 11th, Melania Trump said in a taped interview that victims should provide hard evidence to make accusations of sexual harassment which has triggered public outrage (Bennett, 2018).

Note:

Thanks to my friend Xi Chen(not in the UP program) for teaching me how to condense the codes and arrange the functions logically.

Reference:

Khomami, N. (2017, October 20). #MeToo: How a hashtag became a rallying cry against sexual harassment. Retrieved from https://www.theguardian.com/world/2017/oct/20/women-worldwide-use-hashtag-metoo-against-sexual-harassment

Klas, M. E. (2017, December 11). Women in politics fear #MeToo moment will backfire — and they’ll be the ones punished. Retrieved from https://www.miamiherald.com/news/politics-government/state-politics/article189152134.html

Bhushan, N. (2018, October 12). #MeToo in Bollywood: Aamir Khan, Akshay Kumar Drop Projects Over Allegations Against Directors. Retrieved from https://www.hollywoodreporter.com/news/metoo-bollywood-spreads-aamir-khan-akshay-kumar-drop-film-projects-1151811

MGTOW. (n.d.). Retrieved from https://www.mgtow.com/

Bennett, K. (2018, October 11). Melania Trump says women ‘need evidence’ if they say they’re victims. Retrieved from https://www.cnn.com/2018/10/10/politics/melania-trump-metoo-evidence/index.html

[1] He tweets: ‘@TimesNow Let’s start the #MGTOW movement in india to kick the ass of #MeToo movement which only targets rich and f\u2026 https:\/\/t.co\/ncMO3DsJwK’

from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import tweepy
import datetime
import time
import csv
import jsonfrom datetime import datetimeapp_key = 'app_key'
app_secret = 'app_secret'
access_token = 'access_token'
access_secret = 'access_secret'class StdOutListener(StreamListener):
    def on_data(self, data):
        tweet = json.loads(data) 
        try:
            text = tweet["text"]
            source = tweet["source"]
            if ('RT @' not in text):
                with open('tweets_write6.txt', 'a') as f:
                    f.write(data) 
                return True
        except ValueError:
            print("Something went wrong with streaming")
            time.sleep(15)
        return True
    def on_error(self, status):
        print (status)if __name__ == '__main__':#Twitter authetification and the connection to Twitter Streaming API
    auth = OAuthHandler(app_key, app_secret)
    auth.set_access_token(access_token, access_secret)
    stream = Stream(auth, StdOutListener(), tweet_mode = 'extended')
    api = tweepy.API(auth)#filter Twitter Streams to capture data by the keywords: 'MeToo' in NYC
    stream.filter(track=['MeToo movement'], locations = [-74.275818,40.505446,-73.825378,40.816927], languages=["en"], stall_warnings=True)import pandas as pd
import matplotlib.pyplot as plttweets_data_path = '/Users/Young/Desktop/urban informatics/assignment/assignment 2/tweets_write6.txt'# finding the tweets with 'metoo' in the full text
tweets_data = []
tweets_file = open(tweets_data_path, "r")
for line in tweets_file:
    try:
        tweet = json.loads(line)
        text = tweet["text"]
        if 'MeToo'in text or 'Metoo'in text or 'metoo'in text:
            tweets_data.append(tweet)
    except:
        continue# clean the tweets
import re
from nltk.stem.porter import *
stemmer = PorterStemmer()
def to_keep(w):
    if len(w)<=3: # remove short words
        return False
    if w.lower() == 'metoo' or w.lower() == 'movement': # remove metoo movement
        return False
    if w == '&amp;': # remove &
        return False
    if w[:5] == "https" or w[:4] == "http": #remove url
        
        return False
    return True
    
    
    
def remove_pattern(input_txt, pattern = "@[\w]*"): # remove @XXX
    r = re.findall(pattern, input_txt)
    result_txt = input_txt[:]
    for i in r:
        result_txt = re.sub(i, '', result_txt)
    result_txt = result_txt.replace("[^a-zA-Z#]", " ") # remove punctuations
    result_txt = result_txt.replace("metoo", " ")
    result_txt = result_txt.replace("Metoo", " ")
    result_txt = result_txt.replace("#MeToo", " ")
    result_txt = result_txt.replace("movement", " ")
    result_txt = ' '.join([w for w in result_txt.split() if to_keep(w)])
    return result_txt# clean the tweets
import re
from nltk.stem.porter import *
stemmer = PorterStemmer()
def to_keep(w):
    if len(w)<=3: # remove short words
        return False
    if w.lower() == 'metoo' or w.lower() == 'movement': # remove metoo movement
        return False
    if w == '&amp;': # remove &
        return False
    if w[:5] == "https" or w[:4] == "http": #remove url
        
        return False
    return True
    
    
    
def remove_pattern(input_txt, pattern = "@[\w]*"): # remove @XXX
    r = re.findall(pattern, input_txt)
    result_txt = input_txt[:]
    for i in r:
        result_txt = re.sub(i, '', result_txt)
    result_txt = result_txt.replace("[^a-zA-Z#]", " ") # remove punctuations
    result_txt = result_txt.replace("metoo", " ")
    result_txt = result_txt.replace("Metoo", " ")
    result_txt = result_txt.replace("#MeToo", " ")
    result_txt = result_txt.replace("movement", " ")
    result_txt = ' '.join([w for w in result_txt.split() if to_keep(w)])
    return result_txt# create a datafram with cleaned tweets and its polarity
from textblob import TextBlob
data = {'text': [], 'created_at': [], 'source':[], 'favorite_count':[], 'retweet_count':[], 'entities':[], 'len':[], 'tidy_tweet':[], 'polarity':[], 'subjectivity':[], 'polar_eva':[]}
 
ptweet = 0
ntweet = 0
neutral = 0for t in tweets_data:
    text = t['text']
    data['text'].append(text)
    data['created_at'].append(t['created_at'])
    data['source'].append(t['source'])
    data['favorite_count'].append(t['favorite_count'])
    data['retweet_count'].append(t['retweet_count'])
    data['entities'].append(t['entities'])
    data['len']. append(len(text))
    data['tidy_tweet'].append(remove_pattern(text))
    
    tx= TextBlob(text)
    data['polarity'].append(tx.sentiment.polarity)
    data['subjectivity'].append(tx.sentiment.subjectivity)
    
    if (tx.sentiment.polarity > 0):
        data['polar_eva'].append(1)
        ptweet += 1
    elif (tx.sentiment.polarity < 0):
        data['polar_eva'].append(-1)
        ntweet += 1
    else:
        data['polar_eva'].append(0)
        neutral += 1df = pd.DataFrame(data)display(df.head(100))df.to_csv('out3.csv', encoding='utf-8', index = True)print(ptweet, ntweet, neutral)# pie plot of the distribution of polarity 
labels = ['positive tweets', 'negative tweets', 'neutral tweets']
sizes = [666, 294, 983]
colors = ['yellowgreen', 'lightcoral', 'lightskyblue']
explode = (0, 0, 0.1)  # explode 1st slice
 
plt.pie(sizes, explode=explode, labels=labels, colors = colors,
        autopct='%1.1f%%', shadow=True, startangle=140)
 
plt.axis('equal')
plt.show()# word clouds of all tweets
all_words = ' '.join(df['tidy_tweet'])with open('all_words.txt', 'w') as fout:
    fout.write(all_words)
from wordcloud import WordCloud
wordcloud = WordCloud(width=800, height=500, random_state=21, max_font_size=110).generate(all_words)plt.figure(figsize=(10, 7))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis('off')
plt.show()# word clouds of neutral tweets
normal_words =' '.join([text for text in df['tidy_tweet'][df['polar_eva'] == 0]])wordcloud = WordCloud(width=800, height=500, random_state=21, max_font_size=110).generate(normal_words)
plt.figure(figsize=(10, 7))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis('off')
plt.show()# word clouds of positive tweets
normal_words =' '.join([text for text in df['tidy_tweet'][df['polar_eva'] == 1]])wordcloud = WordCloud(width=800, height=500, random_state=21, max_font_size=110).generate(normal_words)
plt.figure(figsize=(10, 7))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis('off')
plt.show()# word clouds of negative tweets
normal_words =' '.join([text for text in df['tidy_tweet'][df['polar_eva'] == -1]])wordcloud = WordCloud(width=800, height=500, random_state=21, max_font_size=110).generate(normal_words)
plt.figure(figsize=(10, 7))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis('off')
plt.show()