Shitstorm Detection via Artificial Intelligence: Machine Learning vs. Social Media

Shitstorm Detection, Machine Learning, Artificial Ingelligence: Machine learning and its advances are incredibly exciting to talk about with people whose views on A.I. couldn't be more different. If you talk to someone working in data science, you might get input on exciting winning algorithms or the advantages of clustering over classification. However, if you talk to a marketer, a buyer, an engineer or a business developer, you will find that they have completely different issues: Automation, perhaps, testing, personalization, finance, document recognition, sentiment analysis. So different moods prevail on the subject of "A.I." and "Machine Learning," and that's exactly what today's interview with Sven Bodemer from funk - Content Network of ARD and ZDF is all about. We wanted to know from him how their exciting A.I.-based shitstorm detection works, what's behind it - and where the parallels to the automotive industry are in the development, because that much can be revealed: The results are different, but the methods are definitely the same.

Enjoy this exciting interview!

What to expect in this article:

Shitstorm Detection per A.I. - Data Science made for Social Media (and Automotive)
So many questions about shitstorm detection - and the answers
Shitstorm Detection - From Training to (Success) Model
Labels and validation: how do you feed a shitstorm detection A.I.?
How else can a Shitstorm Detection be used sensibly?

Marc

Marketing Professional

15.11.22

Ca. 15 min

Shitstorm Detection per A.I. – Data Science made for Social Media (and Automotive)

Mobility Rockstars: Hi Sven, glad you found time for us! Would you like to briefly introduce yourself to our readers and explain what exactly you do at funk?

Sven: Since August 2022, I have been in charge of Software Development, Distribution and Analytics (SEDA) at funk. I’m doing the whole thing on parental leave for Rebecca Glinka, whom I would like to greet warmly at this point! My tasks include personnel, budget, project coordination and administrative matters as well as the strategic orientation of our products, e.g. Shitstorm Detection or our WebApp funk.net. At funk, I have been working as a Data Engineer since 2018, focusing on business intelligence, machine learning, NLP, social media interfaces as well as DevOps tasks. I studied computer science with a focus on “Interactive Applications and Machine Learning”.

shitstorm-detection-per-ki-sven-bodemer-interview-with-mobility-rockstars — Sven Bodemer from the funk content network of ARD and ZDF bravely faced the Mobility Rockstars’ questions – and did so with flying colors. (Copyright @funk/Jana Kay)

Mobility Rockstars: Why do you need “shitstorm detection” at all? Can’t you do it manually or with existing social listening tools like Sprout Social etc.? What do you do differently?

Sven: Unfortunately, this is not possible manually and with existing tools on the market only to a limited extent, because our use case is special. In general, we as funk and part of the public broadcasters have to deal with disinformation and hate on the net almost daily. This was the reason why we launched the radio-internal, interdisciplinary project “Disinformation and Waves of Outrage” in 2021. For my team, the technical aspect was to develop a solution to detect shitstorms as early as possible and thus support our communications and content team in the best possible way.
What are we doing differently now? Not only do we look at social media platforms per se, but we have also developed monitoring for Telegram. In our analyses of past shitstorms, we have found that waves of indignation are already announced in groups and are sometimes organized there. We have also trained our own sentiment analysis model, which can determine the tonality (positive, neutral, negative) of a comment.

So many questions about shitstorm detection – and the answers

Mobility Rockstars: Keyword sentiment analysis: What exactly is the procedure here? How are the results clustered, and by what process? k-means? Or is classification used? Furthermore, how do you separate legitimate, negative comments (“I find the war in Ukraine appalling!”) from disinformation campaigns, shitstorms or bots?

Sven: Okay wow, that was a lot of questions at once!
Our sentiment model can distinguish positive, neutral and negative, i.e. perform a classification. For the training, we applied supervised learning, a method from the field of machine learning. A supervisor specifies what is right and what is wrong – in our case, this refers to the comments. In very practical terms, this means that we manually annotated a dataset of 120,000 comments with positive, neutral, and negative. With a clustering method like k-Means (unsupervised learning) this would not have been necessary, but here the disadvantage is that the formed clusters have no annotation to what they thematically belong to. This in turn means a lot of manual effort. The goal was for our system to be able to automatically evaluate the tonality of a comment, hence classification and not clustering.
Another example is the comment “Who pays you?” which was commented under the maiLab video “Mandatory vaccination is OK”. The:The user:in wanted to accuse moderator Mai Thi Nguyen of alleged corruption. Comments of this type are correctly recognized by our system as NEGATIVE, even if no typical negative connotation is included.

Mobility Rockstars: And another keyword: “forecasting” – When it comes to forecasting or predictive analytics, predictive maintenance plays a major role for us, for example. What do you predict / predicted in the context of shitstorm detection?

Sven: Our system predicted the probability with which a new, unknown comment is more likely to be positive, neutral or negative. Another use case is, for example, a forecast of the future development of subscribers to our channels.

Shitstorm Detection – From Training to (Success) Model

Mobility Rockstars: Artificial neural network, already trained – how was the shitstorm detection trained (BERT, Deep Learning)? BERT, for example, is trained as a transformer model for large amounts of text. How was your fine tuning done? Since the out-of-the-box model “only” emulates word embedding and speech understanding, would end training need to be done for the specific task of shitstorm detection? Transfer Learning?

Sven: Transfer learning sums it up pretty well. We use as a basis an already trained BERT model in the German language version. This was trained with ~12GB of data including the complete German Wikipedia(https://huggingface.co/bert-base-german-cased). Based on this, we generated another “layer” and “inserted” the already mentioned dataset of 120k comments. Fine-tuning – or more precisely – finding the optimal hyperparameters of our model took several weeks. This is where the AI platform “Weights & Biases”(https://wandb.ai/site) has helped us wonderfully.

Mobility Rockstars: And why BERT at all? Have you tried other tools as well, or were there specific reasons? Was a simpler benchmark built before (ELMO)? Convolutional Neural Network, where Text Embedded has also been done? Or was it rather “never change a running system”?

Sven: BERT is simply state-of-the-art in this area and offers many open source models in German. Yes, there are now newer methods and models, but mostly only in English, which is not very useful for us.

Never change a running system? Currently a clear YES! We have been “live” with it since September 2021 and have not had to make any changes to our model yet – but are planning to.

Mobility Rockstars: You mentioned that you have classified 120,000 comments yourselves – did you really label all those posts yourselves? Why did you decide to take that step, and what lift did that provide – is that the difference from the 93 percent of correct detection compared to the 75 percent of the off-the-shelf model of Perspective A.I.?

Sven: Yes, the 120k comments were manually annotated by our community management team. That was a lot of work, but the good results of our model here give us a great confirmation that it was worth it.
I have to classify the comparison with Google’s Perspective API a bit. It is true that this free Google service scored 75% accuracy in our test. Now, it should definitely be mentioned here that the Google model does not know or has been trained with social media data. This is of course different from our model and explains well the differences. At the very beginning of the project, the test was simply about how good out-of-the-box solutions are on the market.

Labels and validation: how do you feed a shitstorm detection A.I.?

Mobility Rockstars: Labeling is always an exciting topic, but in the case of your shitstorm detection, it’s particularly exciting: How do you determine the clear classification of positive, negative or neutral? Was a labeling manual created, for example, given to different people to see if they were consistent in classifying comments as positive or negative, and was this then assessed and statistically evaluated?

Sven: A manual is a bit too much to say, we have noted down on one page the most important criteria when a comment is positive, neutral or negative for us. Our 15-member community management team then integrated this into its daily work and continuously evaluated comments. The resulting data set was split into training and test data. Keyword “Cross Validation”: This is a procedure to check how good the predictive power of a model is. The training data is the foundation, the test data is used to validate the model and is NOT used for training.

Mobility Rockstars: Since we are also always busy in the cloud sector, we are also particularly interested in this aspect of your work: How did you implement the production instance? Does this exist in a cloud, for example on ARD servers or comparable? How do you deal with “data drift” and “concept drift” – does the model need to be re-trained here? Are machine learning ops and monitoring in place for this to ensure what training data is available and working?

Sven: Our production instance runs in the cloud and uses a GPU to be able to evaluate new comments in seconds. Our data warehouse is hosted within ARD. For monitoring, we use the board resources of the Google Cloud.

Mobility Rockstars: Can you think of it as there is an alert, and when that occurs, active spec loop verification starts? Say, the alert is registered, the percentage of false alerts is recorded, and this data is used in re-training to prevent the going stale of a model with such a high maturity level?

Sven: Currently our model is static, so there is no re-training. We are currently checking how much the language of our community has changed since the training and will initiate a re-training if necessary.

Mobility Rockstars: Let’s go back to the sentiment analysis: We would like to talk about the topic modeling. Can the type of shitstorm be identified in detail? So can the negative trend still be divided into (for example) comments against NATO, against Putin, against Melnyk, etc.? Is there a post-solution for this?

Sven: Unfortunately no! Our model is “only” able to recognize three classes. A differentiated analysis according to specific topics is not possible and was not intended for this project. Of course, the question is totally obvious and justified. What we do in the business intelligence team at funk, however, is to run an ad hoc analysis on such specific issues as comments against ARD/ZDF, NATO, against Putin, and so on. For this we use the data from our data warehouses, SQL-Magic and the wonderful Python library spaCy.

How else can a Shitstorm Detection be used sensibly?

Mobility Rockstars: In what other technology areas could you imagine using A.I. to detect questions and comments? How could your concept be applied to the automotive industry, for example?

Sven: Actually in all areas that process text data. Be it the TikTok- account of an automobile manufacturer, the Twitter account of a manufacturer for head-up displays, or on the company’s own website to offer a chat bot.

Mobility Rockstars: And vice versa – what technologies, solutions or insights have you drawn from automotive data science for your work?

Sven: The Automotive Data Science area is new to me, so I can’t say anything about this. As a team of Data Scientist, we use the typical scientific tech stack that most have encountered in college or university. These include numpy, pandas, spacy, sickit-learn, pytorch, keras, tensorflow or plottly/dash.

Mobility Rockstars: Sven, thank you very much for your time and effort to answer our questions. It was really interesting, and who knows – funk and Cognizant Mobility will certainly cross paths again. The basic fields of data science clearly show the common roots in machine learning, classification and the use of quite similar approaches – even if you are definitely ahead in terms of shitstorm detection! We were very happy about the deep and detailed insight into your exciting work, thank you very much!

Sven: I also thank you for your exciting questions and would say we stay in touch! Happy coding 🙂