what to do for the environment

Figure 1: Themes in response to “What are you doing or could you do for the environment?”

Key principle: moving from words to concepts

Language is a complex subject for computers to handle. Human’s speak as easily we breathe, to the point of forgetting that before knowing how to communicate, we first took a few years to learn to speak and then to write, and studied at length before knowing how to summarize text (which remains a difficult exercise even for humans). So, let’s immediately respond to that myth: the machine doesn’t understand, it simulates. It does not really do sentiment analysis: it sorts, arranges & classifies information according to symbols such as letters, words and sentences. To succeed in finding interesting insights, you will have to guide the machine; alone, it will only be able to create “statistics” from the language used in a corpus. But this help is already extremely valuable when we have the right tools.

The Great French National Debate: a useful case study for all of us

The Great French National Debate, initiated on January 15, 2019 by the French Government upon the initiative of Emmanuel Macron, the French President, is based on a digital platform (granddebat.fr) allowing each citizen to express themselves on 4 themes: ecological transition; taxation and public spending; democracy and citizenship; the organization of the state and public services. Around these themes, a number of open-ended questions were asked, such as: “What are you doing today to protect the environment and / or what could you do?”. Hundreds of thousands of contributions have been produced: on the theme of ecological transition, this represented more than 700,000 responses (at the time of this publication) to the 12 open questions.

Analyzing this matter is an inhuman task. The amount of information is more than what a human can reasonably read and synthesize: this volume of text represents some 20 million words, or 40 times the size of the book War and Peace. But assisted by an artificial intelligence system, it becomes quite accessible, without necessarily requiring advanced technical skills.

What can the machine do?

In language processing, there are roughly two schools. The symbolic approach seeks to “encode” the rules of language (grammar, syntax, lexicography) and produces expert systems based on linguistic rules. The statistical approach has recently seen spectacular breakthroughs with the results of artificial neural networks, better known as machine learning and deep learning.

While these two methods are often opposed, in practice, a hybrid approach between the two should be favored in order to take advantage of the incredible capabilities that Artificial Intelligence brings when complemented by human intelligence. By combining these two approaches, we offer humans an “augmented intelligence” thanks to what the machine can produce: the machine proposes, the human optimizes (he validates, corrects, and directs the machine).

When you load your body of text, Proxem Studio will automatically sort, organize and classify the entire vocabulary and offer thousands of concepts that emerge naturally. This method has a double advantage over old school approaches. On the one hand, it makes it possible not to start from an a priori: the user no longer needs to spend days creating a dictionary of words corresponding to the expected themes, they can let themselves be guided by what is really present in the data. On the other hand, it is no longer necessary to manually annotate thousands of document examples to teach the machine the relevant concepts: it is now able to identify them on its own.

The example below shows the result obtained on the data of the “ecological transition” theme after the analysis of 700,000 feedbacks. The machine has already automatically grouped terms (words and expressions) that “go together”. For that, it operates a bit like when we read the Smurfs comics. With our old human brains, we easily understand every occurrence of “smurf” from the context of the words in which the term appears. This is exactly what the machine does by automatically grouping expressions into thematic and sub-thematic.

vocabulary clusters provided by the machine > Dassault Systèmes

Grouping of vocabulary > Dassault Systèmes

What to do with automatic suggestions? Put the human in the loop!

From the suggestions automatically offered by the Proxem Studio, the analyst in charge of the analysis will be able to select the themes that interest them and set up their own classification plan, to ultimately produce statistics. For example, it will decide to group together everything related to consumption, distinguishing between changes in consumption patterns (responsible consumption, reducing consumption, etc.), the development of organic agriculture, local consumption or short circuits.

For our case study, we decided to classify according to ten themes “Energy”, “Transport”, “Consumption”, “Waste management”, “Taxation”, etc. and as many sub-themes for each main theme. The same text can of course be multi-classified if it refers to several subjects.

Figure 4: Example of analyzed feedback in which the machine highlights the key expressions.

In the example above, the concepts detected are classified according to the themes “Energy” (sub-themes “insulation” and “energy saving”), “Consumption” (“responsible consumption” and “limiting consumption”) , “Waste management” (“waste reduction”) and “Causes” (“planned obsolescence”).

definition of separation semantic > Dassault systèmes

Figure 6: Separation of the meanings of “Holland”.

To fully capture the meaning and avoid misinterpretation, the analyst can also rely on our linguistic rule engine. Here too, the machine will help them, by providing synonyms, paraphrases or close expressions, or helping to resolve ambiguities in the presence of polysemous terms.

To complete a theme, we will also rely on the AI of Proxem Studio. From a few examples, it provides suggestions to complete our classification as in the example below on the concept of “Composting”.

automatic suggestion semantic

Example of automatic suggestions for enriching the “composting” concept.

Once the classification plan has been completed, all the texts are then sorted, arranged, classified and we then move on to a more traditional approach of exploratory and visual statistics.

Some results from the analysis carried out using Proxem Studio

Proxem Studio allows you to have a global vision of what is being expressed but also to go into details to better understand “weak signals”.

Based on the data analyzed, our study highlights four important themes in the eyes of citizens, concerning their contribution to environmental protection.

what to do for the environment

Figure 7: Important topics for the protection of the environment.

The results obtained on this sub-corpus allow us to deduce that the majority (54%) of the participants in The Great French National Debate are concerned with waste management. There are indeed many references to recycling, composting, reuse, the desire to move towards better waste treatment and also to better reuse products and / or containers, in order to reduce their number.

48% analyze their consumption patterns, and try to put in place solutions to adopt more responsible behavior.

Many (44.7%) are attentive to the means of transport chosen, to try to reduce the number of trips, to prefer public transport and / or transport that uses green energies, rather than fossil fuels.

Moreover, we note that 32.5% of respondents think it is urgent to turn to green energies, more respectful of the environment, and to reduce the consumption of fossil fuels, by favoring better insulated buildings and by reducing domestic energy consumption.

We can then seek, for example, to know more precisely the ideas brought forward by the citizens on the subject of consumption:

Local action cluster > Dassault systèmes

Figure 8: Possible local actions for the protection of the environment.

To the question on citizen contributions aimed at protecting the environment, if we are interested in the responses obtained, citizens mainly express the desire to:

Consume local (17%);
Turn to organic (15%);
Limit their consumption (11%);
Save water (10%);
Limit meat consumption (6%).

Citizens also communicate about the daily actions they take to fight against global warming. We also notice that:

The theme of responsible consumption is discussed;
Consumers pay attention to the chemicals used in food production, and also in the composition of cleaning products;
Others will instead seek to ban food products that contain controversial substances, such as palm oil.

We can note a trend of boycotting GMOs. There are also some contributions that will promote reasoned and responsible agriculture, the use of reusable bags etc.

The last theme marked in these responses concerns traceability, European and national standards on pesticides.

Conclusion

The configuration of the semantic analyzer was completed in less than two days thanks to Proxem Studio. This raw material would require a longer study time to extract all the substantive insights.

Artificial intelligence makes this kind of analysis possible and makes it much easier. Just as Excel has become the benchmark software for processing numbers on a daily basis, we can bet that Proxem Studio will be used to analyze texts.

Note: all screenshots come directly from the Proxem Studio software.

This article was originally written in French by François-Régis Chaumartin, CEO of Proxem, and Thomas Cohu, Marketing and Product Director.

The Great French National Debate: how Proxem Studio’s Artificial Intelligence helped analyze citizen opinions

what to do for the environment

Key principle: moving from words to concepts

The Great French National Debate: a useful case study for all of us

What can the machine do?

What to do with automatic suggestions? Put the human in the loop!

automatic suggestion semantic

Some results from the analysis carried out using Proxem Studio

what to do for the environment

Conclusion