A Clearer Way to Analyze Protest Images
This article presents a new system for better understanding images of political protests.
― 9 min read
Table of Contents
Using images as data has become a common practice in political science. Current methods for analyzing images can achieve high accuracy, but they often lack clarity on how they arrive at their results. This article introduces a new two-part system designed to make image classification clearer.
In the first step, the system divides an image into parts, identifying the objects present. It then creates a summary that describes these objects. In the second step, this summary is fed into machine learning models to classify the images. We applied this system to a dataset containing over 140,000 images to identify which images show political protests.
This approach has three key benefits. First, by labeling the objects found in each image, it makes the analysis more understandable. Second, knowing what objects are present helps researchers see what sets protest images apart from non-protest images. Third, comparing the importance of objects across different countries highlights how protest activities can differ. These insights are not usually possible with traditional image analysis methods and open up new directions for research.
The Role of Images in Political Science
Recent advances in artificial intelligence and image analysis have led to more widespread use of images in social science research. Images have unique advantages over written text. They can be understood without language, allowing researchers to develop a single model instead of needing to create one for each language. Additionally, images can reveal details about concepts that may not be mentioned in text, such as violent actions, crowd composition, or the use of symbols. This has inspired innovative research in political science, exploring topics like the emotional impact of images, fake vote counts, or media portrayal of politicians.
Most current methods of analyzing images rely on deep learning networks, which can achieve impressive accuracy. However, understanding why these networks label an image in a certain way can be quite difficult. This challenge is even greater when analyzing complex images that contain many different objects. As these methods gain importance in various research areas, the need for clearer interpretation becomes crucial. This article presents a solution to make these image analysis models easier to explore and apply in social science contexts.
Introducing a Two-Part Image Classification System
The two-part classification system improves the interpretability of image analysis. In the first part, it creates a summary of the objects present in the image, and in the second part, it uses that summary to identify whether or not the image contains a specific topic of interest, such as protests. We tested this system on a new dataset of 141,538 protest images from ten different countries. The rise of social media has resulted in a growing number of images documenting protests, and researchers have begun to compile datasets from these visuals.
Protest images are often complex, displaying a variety of objects like people, flags, and signs. By using our two-part system, we found three advantages. First, the identification of objects within images offers immediate clarity compared to previous methods. Instead of focusing on pixels, our system highlights human-understandable items such as "person" or "flag." Second, we can conduct simple validation tests to see which objects effectively differentiate protest images from others. Third, by examining object importance across countries, we can see how protest activities vary in different regions. None of these insights can be obtained from traditional image Classification Methods.
The Limits of Conventional Image Classification
Before deep learning became prevalent, image classification relied on simpler statistical models, which were less flexible. The introduction of deep learning networks and larger datasets has greatly improved the ability to classify images accurately.
For example, a convolutional neural network (CNN) is structured in layers, starting from feature extraction and leading to output layers that predict content. Each network has millions of parameters, making it challenging to figure out why an image receives a certain label. This is especially true for complex images, where understanding the reasoning behind classifications becomes even harder. Various methods have emerged to help interpret these models, but they often still require researchers to decipher clusters of pixels rather than identifying the objects they represent.
Moving to a Two-Part Classification System
Our two-part classification system simplifies the process. In the first step, the system recognizes objects within an image and creates a summary, which helps provide an easy-to-understand representation. The second step involves using this summary in standard machine learning classifiers to determine whether the image represents a particular event, such as a protest.
Breaking Down the Process
Object Detection: The first step involves detecting and classifying the objects present in each image. This can include drawing boxes around objects and labeling them appropriately. A more advanced method, called instance segmentation, doesn’t just provide bounding boxes but also creates detailed masks for the shapes of the detected objects. This allows the system to identify more specific features within the image.
Creating Summary Vectors: After identifying the objects, we summarize this information into a feature vector. There are different ways to create these vectors:
- Binary Features: These vectors indicate whether certain object categories are present or not.
- Count-Based Features: These add up how many of each object category is present in an image.
- Area-Based Features: These measure the size of the objects within the image, either by identifying the largest instance of each category or summing the total area occupied by all instances.
In the second step, we train a standard machine learning classifier to predict whether an image contains a protest or not based on the summary vectors derived from the first step. We used several classifiers to see how well they perform.
Applying the Classification System to Protest Images
To test our two-part system, we collected a new dataset of protest images from social media. This dataset focuses on significant protest events across various countries from 2014 to 2021. Using Twitter, we gathered millions of images, which were sorted and categorized by human coders. Each image was labeled as either containing a protest or not, with varying levels of certainty.
We created two sets of data: a training set with images used to teach the model and a testing set to evaluate its performance. The final dataset consisted of 141,538 images, split into training and testing groups.
Training the Two-Part Classifier
Using the protest image dataset, we constructed our two-part classifier while testing different combinations of object detection vocabularies, summary vectors, and classifiers.
Object Vocabulary: We used common vocabularies like COCO and LVIS to identify objects within images. These vocabularies have a varying number of object categories, which influences performance.
Feature Generation: We applied the four types of summary vectors mentioned earlier: binary, count-based, area maximum, and area sum features. This lets us see how different features affect performance.
Classification Method: Different classifiers were tested with the generated summary vectors. This included logistic regression, decision trees, and gradient-boosted trees. We adjusted the classifiers to improve accuracy.
Evaluating Performance and Insights Gained
The results of our two-part classification system were encouraging, showing that it performs well in identifying protest images. We compared it against traditional methods and observed its strengths and weaknesses.
Understanding what Works and What Doesn’t
- Comparing Vocabularies: Models trained with the LVIS vocabulary, which has a greater number of object categories, consistently performed better than those using the COCO vocabulary.
- Feature Generation: It was clear that while the types of summary vectors made a difference, the impact was less substantial than the vocabulary used.
- Classification Method: The gradient-boosted trees generally yielded the best performance across the board.
Additional analysis indicated areas where the classifier could improve, particularly in characterizing specific types of protests or events. For example, images depicting protests with large crowds or unique symbols were classified more accurately than those showing police presence or chaotic scenes.
Insights from Object Importance
One major advantage of our system is its ability to reveal which objects are most related to protests. For instance, the model highlighted the presence of people, signs, and flags as key indicators of protest images.
By assessing the importance of specific features, we could see which objects significantly differentiated protest images from non-protest images. For example, the presence of a person or a flag was more indicative of a protest compared to clothing items like shirts or jackets, which provided little extra information.
Comparing Importance Across Countries
Since protests can look and feel different across various countries, we explored the importance of different objects in relation to specific events. We calculated how much a certain object type was emphasized in one country compared to others, revealing distinct trends.
For instance, in Russia, the use of signs and posters was notable, while in Argentina, cars became a prominent feature during the pandemic protests. In Lebanon, candles played a strong symbolic role during protests, especially in honor of a specific tragic event.
Conclusion
Images have become an increasingly valuable resource for political researchers. While traditional methods can classify images accurately, they often lack clarity. Our two-part classification system addresses this issue by making the analysis more transparent and accessible.
By identifying specific objects in an image and creating clear summaries, our method allows researchers to see what leads to a certain classification. While predictive performance might not match that of advanced deep learning methods, the benefits of interpretability, clarity, and comparative analysis are significant.
Moreover, our system is not limited to just analyzing protests; it can be applied to various areas in social science research, opening new avenues for understanding visual data.
Future research can improve upon this method by incorporating specific object relationships, using different segmentation tools, and developing targeted vocabularies. As the landscape for social media evolves, researchers may face new challenges, but the potential for insights from visual data remains vast and promising.
Title: Improving Computer Vision Interpretability: Transparent Two-level Classification for Complex Scenes
Abstract: Treating images as data has become increasingly popular in political science. While existing classifiers for images reach high levels of accuracy, it is difficult to systematically assess the visual features on which they base their classification. This paper presents a two-level classification method that addresses this transparency problem. At the first stage, an image segmenter detects the objects present in the image and a feature vector is created from those objects. In the second stage, this feature vector is used as input for standard machine learning classifiers to discriminate between images. We apply this method to a new dataset of more than 140,000 images to detect which ones display political protest. This analysis demonstrates three advantages to this paper's approach. First, identifying objects in images improves transparency by providing human-understandable labels for the objects shown on an image. Second, knowing these objects enables analysis of which distinguish protest images from non-protest ones. Third, comparing the importance of objects across countries reveals how protest behavior varies. These insights are not available using conventional computer vision classifiers and provide new opportunities for comparative research.
Authors: Stefan Scholz, Nils B. Weidmann, Zachary C. Steinert-Threlkeld, Eda Keremoğlu, Bastian Goldlücke
Last Update: 2024-07-04 00:00:00
Language: English
Source URL: https://arxiv.org/abs/2407.03786
Source PDF: https://arxiv.org/pdf/2407.03786
Licence: https://creativecommons.org/licenses/by/4.0/
Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.
Thank you to arxiv for use of its open access interoperability.