Twitter oblivious to Farage’s media mauling as EU polls open

** UPDATE: Extended methods section at the bottom of the post
This article was originally published on The Conversation. Read the original article.

By Orlanda Ward, University College London and Javier Sajuria, University College London

Nigel Farage, leader of the UK Independence Party, appears to have stolen the show in the run up to the European elections. But while he has been pilloried in the papers, discussion about him on Twitter appears to have been somewhat more favourable.

Since campaigns for the European elections have been largely fronted by party leaders, we’ve investigated the level of mainstream media coverage given to David Cameron, Nick Clegg, Ed Miliband and Nigel Farage over the past six days. We’ve also looked at how much discussion has been going on about the leaders on Twitter over the same period.

We looked at all geo-tagged UK tweets and all national tabloid and broadsheet newspaper coverage of the Conservative, Lib Dem, Labour and UKIP leaders over the period. We analysed the amount of coverage and discussion each party leader received online and offline each day, and the proportion for each that was positive, negative or neutral.

Not a fan of the coalition. 

While there has been plenty of mainstream newspaper coverage of major party leaders in the run-up to EU elections, particularly focusing on Farage, this has not been reflected in online discussions. For a start, political discussion only featured in about 2% of the almost 3m tweets we monitored.

Overall, both online and offline, Cameron and Farage have been the most prominent, trailed by Miliband and Clegg on both platforms.

But Farage in particular has been the subject of very different coverage in the online world and the more traditional press. While the UKIP leader’s media coverage spiked immediately following his now infamous LBC interview, his mentions on Twitter suggest that the online reaction was more of a slow burn, though the tone of the discussion did become slightly more negative. Perhaps the LBC episode failed to inflame passions online becuase it simply seemed to confirm what existing views of Farage – both for and against.

Party leader mentions in newspapers and on Twitter 

What’s more, the intensely negative coverage of Farage in the mainstream media has not been replicated online. If you only read the papers, you’d find that 31% of the comments made about Farage were negative, while between 20-21% of those made about Miliband, Clegg and Cameron could be classed as such.

But the proportion of tweets mentioning Farage that were negative was near identical to Miliband and Clegg at between 22% and 23%. Cameron got an easier ride with just 13% of tweets about him coding as negative.

Tone of mentions of Nigel Farage 

No less than four simultaneous campaigns have been bubbling away in the UK as we carried out this analysis. While the European and local elections have fired the starting gun for the general election, the Scottish independence debate is also in full swing, not to mention the addition of a possible EU referendum.

This has meant that discussion and coverage this week has been fragmented. Miliband has spent the week laying out his policy for the general election amid claims that he’s been missing in action when it comes to European campaigning.

Cameron of course spent two days on a pro-union visit to Scotland, and much of rest of his exposure concentrated on the Chilcot Inquiry and coalition tensions. Clegg, meanwhile, has gained attention for all the wrong reasons: drunken cactus shame; losing his rag with Michael Gove and Andrew Marr; Commons whispers of a Lib Dem deposition and some polls suggesting the party may well fall behind the Greens in Europe.

In contrast, although reporting on Farage was dominated by the fallout from his LBC interview and questions about whether he is or isn’t a racist, it did stay focused on the strength of his party’s electoral prospects and his stance on immigration. What else is there to talk about?

Farage has become the focal point not just for the media, but for the major party leaders this week – that is, when they weren’t focusing on other elections. This short campaign has seen little coherent debate between parties and while their antics of course top the printed press agenda, our data suggests that they are not engaging debate among the wider public.

It also suggests that media lambasting of Farage doesn’t look set to change voters’ minds – at least not those on Twitter.

Methods note

As a response of some of the questions asked about the methods we used to collect and analyse the data, we offer here some basic explanation.

For the Twitter data, we used the streamR package to collect the data from the Twitter streaming API. We focused only on geotagged tweets in the UK, which accounted for an average of 500,000 tweets a day. Some research shows that the streaming API allows us to get most (if not all) of the geotagged tweets in a given period, but we also n=know that no more than 2% of the tweets contain geographical metadata. This is obviously a bias, and we try to be as explicit as possible about the limitations of our data and methods.

The newspaper data was obtained using Nexis and getting all articles that contained the name of one of the UK party leaders. As explained above, the number of mentions for the leader of the Green Party were too low (or nonexistent) and we had to remove her from our sample. After we obtained the articles, we trimmed the sentences where the party leader name was mentioned, and that was our unit of analysis.

In terms of analysis, we made some changes to the methods used by Pablo Barberá on his workshop at SMaPP NYU. Most of the R functions created for this purpose were compiled in a (very beta) R package called euElection. This package allowed us to obtain a very rough estimate of the tone of the tweets and the newspapers sentences, which is what we used in the article. In simple terms, each word from our units of analysis was compared to a vocabulary of positive and negative words (the “lexicon). All other words were considered as neutral. Then, we obtain the proportion of tweets/sentences that has a majority of positive, negative or neutral tone. We only selected tweets and sentences that mentioned, at least, the name of the party leaders.

We are happy to receive any feedback or comments you may have on this and other methods issues.

The Conversation

Terremoto 1A Chile [1A Earthquake in Chile]

*LAST UPDATE: 02 April 2014 16.21 GMT+0400  03 April 2014 01.11 GMT +0400

A las 20.46, se registró un terremoto cerca de la costa de Iquique, Chile, de magnitud 8.2 grados en la escala Richter y un posterior tsunami (más información sobre el sismo puede ser obtenida a través de La Tercera). Desde unos 5 minutos después del terremoto he estado descargando los tweets que contengan alguna de las siguientes palabras: “terremoto”, “iquique”, “arica” o “tsunami”. Durante los próximos días estaré dando un poco más de análisis, pero por ahora es posible plotear los tweets georreferenciados para poder ver desde donde vienen las conversaciones sobre el terremoto. de aproximadamente 200.000 360.000 tweets capturados, no más del 2% cuenta con georreferenciación, por lo que no es posible inferir ningún tipo de representatividad.

He agregado un wordcloud con las palabras más usadas en los tweets. Como era esperable, el tsunami y Chile son los términos más usados. Además, incorporé un gráfico con la proporción de cuántos tweets hay por idioma (y español tiene más del 80%)

At 20.46, a big earthquake strike the north of Chile, near the coast of the city of Iquique. The quake had a magnitude of 8.2 in the Richter Scale and triggered a tsunami alert in the whole of the Chilean coast. Starting from 5 minutes after the earthquake, I have collected all the tweets I can including any of the following keywords: “terremoto”, “iquique”, “arica” and “tsunami”. During the next days (and I’m not very sure what that means, given that I’m travelling to Chicago for MPSA tomorrow), I’ll aim to provide further analysis. For now, here is the plot of the geotagged tweets (which is not more than 2% of the total population of tweets of around 200,000 360,000 of them).

I have added a wordcloud with the text from the tweets. Not surprisingly, “tsunami” and “Chile” are amongst the most widely used. Also, I created a figure that shows the proportion of the different languages of the tweets.

4,270 geotagged tweets captured for 4 hours after the earthquake.
4,270 geotagged tweets captured for 4 hours after the earthquake.
Wordclous with the 1,000 more used words.
Wordclous with the 1,000 more used words.


Proportion of the different languages of the twets
Proportion of the different languages of the twets

[Spanish] Predecir elecciones y movilizar votantes: Las “deudas” de las redes sociales con las campañas electorales

This post was originally published in “Redes y Elecciones” a collective tumblr about the Chilean elections, edited by the newspaper La Tercera

Este post fue publicado originalmente en el tumblr sobre las elecciones chilenas “Redes y Elecciones”, editado por el diario La Tercera

Todos los candidatos a la presidencia tienen cuenta oficial en Facebook y Youtube y sólo una no está en Twitter. Aunque todos nos imaginamos los beneficios del uso de las redes sociales en una campaña, ¿qué dicen los estudios?

Una de las razones claves para ocupar estas tecnologías es que permite obtener información detallada y específica sobre los usuarios. La segmentación de los mensajes parece ser la nueva consigna política, muy similar a lo que el retail lleva haciendo por años. Algunos ejemplos son “The Victory Lab” de Sasha Issenberg o el más cercano “Tu Cariño Se Me Va” de Daniel Matamala.

Algunos ejemplos:

Sin embargo, hay dos áreas en las cuáles el uso de estas tecnologías está en deuda: la predicción de resultados electorales y la movilización de votantes.

Predecir resultados

Aunque ha habido una serie de investigaciones que han intentado demostrar la posibilidad de predecir resultados electorales a través de las redes sociales,, no han sido pocas las voces que han salido a criticar este tipo de investigaciones (por ejemplo este post de Joshua Tucker y este otro de Dani Gayo).

Las razones son varias. La primera es obvia y se refiere a que los usuarios de estas herramientas no son una muestra representativa de la población, por lo que no es posible inferir el comportamiento electoral a partir de la información que obtengamos de ellos. Si bien este argumento es convincente, estamos cerca de poder hacer una conexión más clara entre lo que se lee en Twitter y lo que dicen las encuestas.

Otros argumentos para plantear que no se pueden predecir elecciones con estos datos obedecen a que las metodologías usadas son poco claras, lo que poco a poco ha ido mejorando con la inclusión más académicos en el debate.

En definitiva, no se pueden predecir elecciones aún, pero parece ser que es cosa de tiempo para lograrlo.

Movilizar a la gente a votar

Descubrir la posibilidad de las redes sociales de llevar votantes a la urnas tiene mucha relevancia con la inclusión del voto voluntario. El año pasado se publicó un experimento realizado a través de Facebook durante la campaña presidencial norteamericana del 2008. En él se demostró que el uso de “avisos sociales” en Facebook sí sirve para llevar gente a las urnas. Sin embargo, entre 61 millones de participantes del experimento, sólo unos pocos (alrededor de 0,4%) fueron a votar.

Lamentablemente, en Chile es imposible realizar este tipo de investigaciones debido a las restricciones en el acceso a datos de votaciones que impone el Servel.

Los medios sociales tiene muchos potenciales beneficios para la política: permiten que los candidatos interactúen de manera más directa con sus electores, establece un puente a las campañas y les permite obtener información relevante sobre sus audiencias. Sin embargo, las posibilidades de predecir resultados y movilizar votantes siguen siendo áreas que necesitan ser demostradas.

APSA 2013 Twitter report


Following last year’s humble attempt to provide some insight from the twitter conversations around #APSA2012 (specially considering the last minute cancellation of the conference) – and given that other duties restricted me from attending APSA this year 🙁 – I will be collecting and displaying some data from this year’s conversations. There will be more updates throughout the conference. If you want to follow the chronological reports, you need to start from bottom to top.

Short methods note: Edges are created by mentions, replies or re-tweets. Nodes are coloured according to the components, and their size is scaled according to eigenvector centrality. Isolates (ie. people not talking to anyone but using the hashtag #APSA2013) are not included.


1. DATA: Someone asked me for the data I used to produce this post, and I strongly believe in the importance or replication. Here it is a list of all the tweet IDs I used. Sorry, but that’s the only way I can share it without violating Twitter’s TOS –> DATA

2. I plotted all the geotagged tweets against the map of Chicago. This gives a better sense of where the tweets where concentrated around the city.


UPDATE 10 (AND FINAL): A few comments before I introduce the data. This exercise had two purposes. First, I wanted to freshen up my skills on Twitter data collection and analysis. After spending part of the summer learning a lot on Python, R and SNA (mainly thanks to the International Summer School 2013 “Social Network Analysis: Internet Research”), I decided that an extension of last year’s analysis on the APSA tweets would be a good opportunity. In total honesty, I hope you enjoyed it too. Second, my research agenda uses extensively this type of social media data to draw inferences about political behaviour. Although this particular exercise was extremely self-centred, since I’m focusing on the interactions in a Political Science conference, it provides some insight on what social media data can tell us, and how can we use it to make sense of bigger issues. That’s why I decided to write this other post on Obama’s speech this week, to show some “real life” examples. Also, I realise that I’m not new on this field, and that there are amazing people working on these issues for a long time (most of them with much more sophisticated analyses than mine). I believe in building community, so I tried to attribute their work where appropriate and link to their own websites and Twitter accounts. I extremely recommend you to follow them and their research. Finally, this post will eventually become a paper-like longer post, with more descriptive data and some interesting questions to test. I can’t promise when, but it will come.

Ok, now let’s go to the data analysis. Joshua Tucker (NYU) tweeted today his excitement for being in the “top ten vertices” list from a Twitter SNA made by Marc Smith, using NodeXL. I’ve used NodeXL in the past (and I believe is an amazing off-the-shelf tool for Windows user), but its reliance on the Search API made me realise that I could get better results by downloading the data via the streaming API for the full duration of the conference. It requires more time and resources, but the results are much more informative. Then, I decided to create my own top ten, but using eigenvector centrality instead of betweenness centrality (as in the NodeXL list). The reason is simple: the former relies on the relative importance of the connections of a node. That is, if the people I interact with are more “important” (or central) in the network, I become more important too. Betweenness centrality, on the other hand, focuses on who are the bridges across different nodes, who is more able to connect the rest. Although that is usually an important question in network analysis (actually, I co-authored a paper with Jorge Fábrega where we use it extensively), in substantive terms eigenvector centrality seems more appropriate for the type of network we have here. With that info in mind, here are the winners:

Table: Top 10 accounts according to their Eigenvector Centrality.

1 @APSAtweets @andrew_chadwick @dandrezner @APSAtweets @APSAtweets
2 @texasinafrica @davekarpf @25lettori @dandrezner @dandrezner
3 @abuaardvark @rasmus_kleis @rasmus_kleis @mqsawyer @andrew_chadwick
4 @APSAMeetings @kreissdaniel @andrew_chadwick @LarrySabato @ezraklein
5 @dfreelon @OUPAcademic @APSAtweets @TerriGivens @texasinafrica
6 @CambridgeJnls @insidehighered @mikejjensen @insidehighered @APSAMeetings
7 @andrew_chadwick @FUNGLODE @ezraklein @j_a_tucker @abuaardvark
8 @ProfCaraJones @Worse_Reviewer @StephanieCarvin @monkeycageblog @davekarpf
9 @dandrezner @j_a_tucker @TerriGivens @washingtonpost @raulpacheco
10 @zizip @abuaardvark @raulpacheco @APSAMeetings @ProfCaraJones

In terms of volume, day 4 was the smallest one. With only 114 nodes and 141 edges, the conversations were less frequent. A possible explanation is that most of the delegates had already gone by then, and only those who had panels on that day were staying around the conference venues. The clusters are a bit more institutional, with high prominence form APSA’s official accounts, along with some blogs and websites (such as @monkeycageblog and @insidehighered). A new addition is Larry Sabato, from U. Virginia



The cumulative network does not show many differences from yesterday. This is not surprising, because most of the activity took place before, and most of the communications were between people who already tweeted each other before. The new interactions might have added some weight to the already existing edges, but not much more. In any case, here is the final network of the APSA 2013 Annual Meeting:

APSA 2013 final graph - 868 nodes and 1794 edges.
APSA 2013 final graph – 868 nodes and 1794 edges.

UPDATE 9: Day 3 was clearly quieter than the precious two. A bit of it might be the classical effect of people leaving after they present, or simply wandering around Chicago. It might also be that the panels are becoming increasingly more interesting, and people prefer to pay attention to the presentations instead of tweeting ;). In any case with all the fuss around President Obama’s speech on Syria (Hint: I recently published a quick report on that), I was expecting that IR crowd attending the conference would be very active. Well, just by simple observation of their accounts, they were, but did not necessarily use the #apsa2013 hashtag to express their views. That said, @dandrezner and @ezraklein are some of the “stars” of today’s network, with a high level of eigenvector centrality. The Political Communications cluster remains active with @andrew_chadwick, @rasmus_kleiss, and @25lettori leading the way (clearly a clique around Royal Holloway’s New Political Communication Unit).

Day r of #APSA2013. 253 nodes and 325 edges.
Day 3 of #APSA2013. 253 nodes and 325 edges.

Moving on to the cumulative graph, the network is not becoming much bigger (832 nodes in total). This reflects the lower number of conversations from day 3, but also that some ties are already established and some people keep talking to each other. The APSA team is doing really well in driving the conversation, with @APSAtweets and @APSAmeetings as really central nodes in the network. As expected, those who were central yesterday, remain so today, so no news on that regard. All in all, the network seems to be coming to a point of “convergence” or “stability”, with conversations taking place among the same members and with no significant cliques outside the big group. The question of inter-field dialogue remains open, as some relevant nodes in the network belong to different components (such as @ezraklein in comparison with the rest of the bigger component).

Cumulative network at day3. 832 nodes and 1667 edges
Cumulative network at day3. 832 nodes and 1667 edges


(QUICK) UPDATE 8: Using Pablo Barberá’s StreamR package (along with ggplot2), I mapped the tweets that had location data in them (only 19 out of 1321). Not surprisingly, most of them are highly concentrated in Chicago, but a couple appear to be somewhere else in the US. This goes towards question whether people not attending the conference are getting any benefit by tweeting about it. There were no geolocated tweets outside the US, in case you were wondering.

Geolocated tweets in red.
Geolocated tweets in red.

UPDATE 7: This is the final summary of day 2. The next 2 days I aim to produce just one daily report, so you’ll have to bear with me. Again, I present two graphs. The first one is the full network for all the days of the conference (including pre-conference events). The second one contains all tweets captured at day 2 until 7.30pm.

The cumulative network shows again a big component in pink, but the network is becoming much more diverse than in previous iterations. More clusters appear, while others that were disconnected (such as the one lead by @funglode) are now connected to the bigger network. The usual suspects remain as key actors in the network, and depending on the volume of tweets over the weekend, they will probably remain in that position. Some well-known IR scholars do not belong to the bigger component, which is an interesting phenomenon. If we look at @ezraklein or @SlaughterAM, they are connected to the big network, but form clusters around them (perhaps the cross-field conversations are not as clear as I thought) The Political Communications group is highly active, especially @andrew_chadwick@zizip and @davekarpf (who also shared a widely tweeted panel today, which might also account for their relevance in the network).

An important notice is that this exercise is, in some way, a performative process. While I publish these networks, some people become aware of their own position and the people they interact with. That is always something to take into consideration when doing the analysis, which brings some epistemological discussions to the table (this is like Schrödinger’s cat reporting on its own experiment).

Cumulative network at August 30, 7.30pm. 704 nodes and 1384 edges
Cumulative network at August 30, 7.30pm. 704 nodes and 1384 edges
Network for day 2 (30 August). 375 nodes and 635 edges.
Network for day 2 (30 August). 375 nodes and 635 edges.


UPDATE 6: This time I’m bringing two graphs. The first one corresponds to the cumulative network. That is, the Twitter conversations from the pre-conference events until the last update. The second graph corresponds only to the conversations taking place during day 2 until 1pm CT. As you can notice, there are similarities among the networks, such as the existence of a big component in the middle (the cumulative network uses strongly connected components to colour the nodes). However, the central actors vary a bit. There are some accounts that remain relevant and central to the network, such as @apsatweets@dandrezner@texasinafrica@raulpacheco@ezraklein and @j_a_tucker. However, we can observe some new actors coming into the scene, such as @heathbrown and the institutional account for @insidehighered. Also, there is an interesting cluster formed by @funglode and @anniavaldez, formed mainly by Spanish-speaking users.

The field boundaries seem more diffused now, which brings questions about whether conferences actually create the opportunity for cross-field dialogue. There are several panels trying to analyse the overall role of Political Science, and how can we communicate better with our audiences. Maybe that is driving a lot of the conversations. That’s an interesting hypothesis to test. Another interesting fact is that some central nodes are people who are not attending APSA this year (such as myself :)). This also brings a question about who benefits from the conference, and if it is necessary to attend to obtain some basic returns from it. Obviously, we need to get data from other sources outside Twitter to find that out. In the meantime, this has become more than a simple exercise of mapping APSA.

Cumulative network from the pre-conference events until August 30 at 1pm. 603 nodes and 1113 edges.
Cumulative network from the pre-conference events until August 30 at 1pm. 603 nodes and 1113 edges.
Network for day 2 (August 30) at 1pm. 223 nodes and 323 edges.
Network for day 2 (August 30) at 1pm. 223 nodes and 323 edges.

UPDATE 5: This graph is a lot bigger than the previous one, as it brings together the data form the pre-conference events plus the day 1 (August, 29). Thanks again to @jorgefabrega for the help using the Search API to retrieve that data (I know, the search API might not be the best option to get an accurate picture, but it’s the only one I had available. If you want a thorough discussion of the representativeness of the different Twitter APIs – mainly the Streaming API – I would definitely encourage you to look at Mostatter et al. 2013)

Back to business. I made some small changes to the visualisation this time. I used strongly connected components instead of weakly connected components. First, it made more sense since the network is directed. Second, with the weakly connected component we got a big group in the middle where almost everyone was connected, which is not true. Also, one of my goals is to analyse the networks and try to make a comparison by sections/fields affiliation (if anyone is interested in helping with that, please let me know in the comments section!). This time we have 479 nodes and 823 edges.

I’m currently collecting data from today’s sessions, and will provide a daily graph and an accumulated one. Let’s see how that works. As usual, feedback is more than welcome.


UPDATE 4: Last graph of the day (it’s pretty late here in London). This corresponds to an accumulated network of the entire first day of #APSA2013 until 7pm, Chicago time. Now the network is much bigger than the previous one (it seems that conversations take some time to build up) with 327 nodes and 489 edges. The clusters we saw in the previous graph are much more diffused now. We can observe a big central component (in green) that connects most members of the network. However, it is possible to observe some patterns in the conversation that can be attributed to different fields or the type of Twitter accounts (oddly enough, publishers’ accounts tend to mention and re-tweet each other).

Tomorrow morning I will aim to produce a larger accumulated network with info from the pre-conference events (thanks to Jorge Fábrega for his help on getting that data). Also, I aim to produce the accumulated version and a daily one. Let’s see if we can get something from these dynamic networks. I hope you are enjoying the conference, stay tuned!

Full network of day 1 of APSA (August 29)

UPDATE 3: At 3pm Chicago time, things got much more complex and ‘networked’ (pun intended). At this point there are 167 nodes (ie. Twitter accounts) and 261 edges (defined by mentions, replies or re-tweets. We can observe a big cluster in the middle (in dark orange) where the APSA official account (@APSAtweets), alongside some well recognised political science/tweeters, such as @dandrezner@raulpacheco, and the recently acquired by the WaPo, @monkeycageblog. Another recognisable cluster (in pink) is the one formed by the Political Communications scholars, such as @zizip@andrew_chadwick@davekarpf, and @abuaardvark)

Network at 29 August, 3pm CT

UPDATE 2: This is the network at 12pm. As you can see, the groups are getting bigger and tighter as the conference evolves.

Network at 29 August, 12pm CT

UPDATE: At 9am in Chicago, this is how the network looks like.

Network at 29 August, 9am CT

Note: Thanks to Alex Hanna for his small – yet crucial – advice on how to build the networks.

Quick report: Obama’s speech on Syria

President Obama made a speech today explaining the US position on attacking Syria (more details here, here, and here). Luckily, I was collecting data on the APSA 2013 conference, so I managed to run a small script and collect some tweets during the speech. It’s a bit early to get a good idea of the tone and the substantive info we can get from there, but for now, let me show you how the tweets are geographically located. Out of ~5000 tweets I managed to collect, only 1% of them had location coordinates, which is pretty much the usual rate. I plotted all of them against a world map, and here is the result. tweets_obama_syria   Now, if we perform some basic network analysis using the data, most of the nodes with bigger centrality were news outlet and the official account of Barack Obama (@BarackObama). Most of the edges correspond to people re-tweeting mainstream media accounts, while others were simply making their own comments. The network shows how all these people interacted during the first 8 minutes of the speech.

Network of tweets mentioning 'Obama' or 'Syria'. Ties represent mentions, replies or RTs, colours correspond to weakly connected components, and the size of the nodes reflects the eigenvector centrality score of each account.
Network of tweets mentioning ‘Obama’ or ‘Syria’. Ties represent mentions, replies or RTs, colours correspond to weakly connected components, and the size of the nodes reflects the eigenvector centrality score of each account.

Finally, I perform some (really) basic sentiment analysis on the tweets of the first 8 minutes. The method was designed by Alex Hanna, from U. Wisconsin – Madison, and I used the list of words developed by Neal Caren, from UNC – Chapell HIll. This also means that words in other languages than English were not coded. The scores are calculated by minute, and they all stay very close zero. However, the sentiment was more negative at the beginning of the speech and ended up being positive. Uncertainty measures are not provided by this (very brute) way of calculating the sentiment, so it’s not possible to know if they are significantly different than zero (this was a boring caveat, but an important one). sentiment   This is all for now. As you can see, there is nothing about IR or geopolitics in this post. Is mainly a way to show how Twitter data can give us a fast (and sometimes overwhelming) way to analyse current events.