An Analysis of a Corpus of Trump’s Tweets



In this post, I will quickly look at a corpus of almost 2,000 of Trump’s tweets which were collected during the presidential campaign period.

Firstly, a look at word frequency. Below is the list of the ten most frequently used words and the number of times they appear in the corpus:

#1 I 587

#2 you 504

#3 Trump 455

#4 thank 325

# 5 great 284

#6 Hillary 274

#7 me 192

#8 crooked 164

#9 Clinton 149

#10 people 139

I find it interesting that there are so many references to Hillary Clinton, as well as the adjective crooked.  Obviously, there’s a high level of preoccupation with his presidential opponent and the frequent labeling of her as crooked. But firstly, I want to have a look at how the word is used.

Below is a collocation network image of and me ordered by MI scores of 3.0 or above:



The strongest collocates of are as follows:


The most frequent collocate of is am, so it would be interesting to look at the concordance lines of I am.


It’s interesting that many of the tweets do not discuss policies or agenda to improve the country, but are focused on the exposure he is receiving in mainstream media. He also declares that he is self-funding his election campaign:

As many of the tweets mention Hillary Clinton, it is necessary to look at what is being said.

Obviously, there is a lot of negativity. This can be seen by looking at the collocates of Hillary:


Hillary collocates with: crooked, radical, fraud and wrong.


Again, by looking at the concordance lines, the level of negativity can be easily seen.

I suppose it’s no great surprise as the election campaign proved to be one in which there was very little constructive conversation or debate, and in his tweets, Trump can be seen as focusing on constructing others negatively.