With the rise of Artificial Intelligence like OpenAI that can suggest a menu for dinner or write a term paper, much that makes humans unique is being left to predictive algorithms. Like using a stock greeting card instead of a handwritten note, the digital age increasingly relies on packaged goods instead of from-scratch creativity. This is not necessarily bad, but what can be troublesome is when the technique is applied to areas that need to be fully understood. Much like an experience of searching for that perfect card only to find none in stock that suit, there are parts of human communication that are missing from existing efforts to automate dialog. One of these areas is in supportive language or help speech.

To explain this first, let me explain the opposite. In the study of “core disgust” Haidt et. al. show that the response is a universal cultural product (1997). In attempting to define the polarity of text, terms which demonstrate disgust are used by one of the most popular sentiment analysis packages in R, Syuzhet (Kim, 2022). This type of lexicon-based analysis is also used in the identification of hate speech online. However, there is a gap in the study of collaborative speech.

To examine this type of language further, this research investigates the idea that there is a correlation between words used in the writings of the Dalai Lama and their sentiment. Specifically, the researcher verifies that the emotional lexicon of trust terms is most frequently found. Based on this research, the terms peace, truth, found, true, and wisdom may imply a supportive text, and these keywords may also point to a common language of peace. The methods also demonstrate possible exercises for students using Computer Science and the Humanities to engage both disciplines. The hypothesis is that trust terms occur more frequently in the Dalai Lama’s writing than other terms that demonstrate emotion.

A holy figure in Tibetan Buddhist culture, the Dalai Lama is known as the Bodhisattva of Compassion. Internationally he gives speeches and meets with other world leaders, such as the current King of England, King Charles III. According to the website DalaiLama.com, he has 133 authored works, with the first being My Land and My People, an autobiography from 1962. The subject of this library of writing is tied to the Principal Commitments, one of which is to promote “human values such as compassion, forgiveness, tolerance, contentment and self-discipline” (The Office of His Holiness The Dalai Lama, n.d.).

Dictionary-based sentiment analysis is a machine learning task that calculates an overall positive or negative score by averaging terms based on their dictionary definition. However, analyzing words that convey complex emotions is still an active research area. While the valence scale has expanded, it must genuinely encompass all aspects of human behavior to apply across domains. Mohammad and Turney used crowdsourcing in 2013 to define the NRC emotional lexicon that moves in this direction, describing words associated with an emotion, such as trust. Their library is used in the analysis with Posit Cloud and the Syuzhet package in R. R is a programming platform which is hosted on Posit Cloud.

Studying moral sentiment in a real-world application is challenging in natural language processing (Miller-Klugesherz, 2022). In 1995, Mayer, Davis, and Schoorman specifically studied trust formation, and Dai et al. even created a tool to measure trust in 2013. However, creating a trust lexicon that can be used to measure collaborative speech has yet to be attempted. Many more words likely convey trust than currently exist in emotional lexicons. However, the emotion must be studied more to confidently use it as a cross-domain categorization that may signal collaboration.

Alternatively, offensive words are actively studied to help administrators reduce conflict online. Work by Waseem and Hovy in 2016 highlights the ability to detect hateful speech and construct a list of characteristics to describe the same on Twitter. Considering the complexity of human emotion and the lack of existing efforts in the positive side of emotional polarity definition, it is an exciting time to pursue research like that presented here. Students seem to want to explore these ideas and understand the sentiment of a Tweet, for example. Also exciting in the research space are tools such as Posit Cloud (Velasquez, 2023), which allow for sharing code with students, letting them explore the foundations of the algorithms being constructed.

Syuzhet (Jockers, 2015) is one package in R that allows for investigating the trusting language in text using the NRC Emotional Lexicon. This baseline allows for investigation into possible sources of trusting language by identifying those bodies of work, including the existing definition of the affective state. The advantage of computational computer science is to allow for massive inputs of data. However, there is bias with existing tools, in that words conveying supportive emotions have not been specifically studied. Further, the explainability of many of these tools is lacking based on the NISTIR8312 (Phillips et al., 2021). The definition of the terms which signal emotion are not publicly available in many proprietary tools. The reliability of textual analysis using sentiment analysis is limited by this lack of explainability (Kim, 2022). By generating a lexicon of cooperative speech, another cross-validation will be possible to verify the results of the computational computer science, especially in this missing affectual space.

Method

The data set contains texts from two sources. Ten books are from the Library, and the other 75 works are from HathiTrust Research Center. The availability of artifacts by the Dalai Lama was more significant than others considered. Using the Digital Humanities Lab, the researcher converted the physical copies into a format that computational methods could analyze.

The researcher gathered all text files onto RStudio Cloud (now Posit Cloud), then used R to find terms with a non-zero valence. This total was then compared to the list of terms indicating trust in the NRC Emotional Lexicon. Aggregating the terms was necessary to compare them to other emotions, with the higher value being the most frequent and the lowest the least.

The method comes from the tutorial for the Syuzhet package in R (Jockers, 2015). Code was constructed to measure the trust occurrence, which defines tokens based on the NRC emotion lexicon and the Syuzhet package in R. Fed into this was the tokenized word list and full text of writings by the Dalai Lama. The results are shown in the following section, but what is remarkable about this method is that the code can be shared with students in a learning environment for them to build on. Like an art teacher might set out the canvas and paint, code can encourage inquiry and prove intuitions once the initial guidelines are set. Different levels of detail can be focused on and analyzed thematically, with the statistics of the Syuzhet package supporting the qualitative conclusions behind the scenes.

To try and illustrate what this looks like in practice, Figure 1 is an example NLP task to categorize a body of text as either collaborative or competitive. The algorithm will take a block of text as input, and if it is positive and includes collaborative words (based on domain specific definition of the features) it would label it as collaborative. This research is an exploratory effort to define these domain specific tokens of help speech. In this case it defines Example 1 as collaborative, 2 as competitive, and 3 as neutral.

Figure 1
Figure 1.Example categorization of collaborative speech

Results

The results represent an initial step towards understanding the language of peace. The average sentiment in the Dalai Lama’s writing is positive, although negative terms were also measured. This contrast may suggest that a simple polarity measurement is not a strong collaboration signal. It suggests that it is somehow in the combination of emotional language that trust is evoked, with the emotion of trust still being the highest to occur. Besides being positive, words in the NRC trust category may signal good candidates for helpful speech based on their high occurrence.

The summary chart in Figure 2 shows the frequency in one title, Essence of Refined Gold. Modifying the code in Syuzhet to show more granular and time-based results is straightforward. For example, it is possible to look at the works of the Dalai Lama and compare how the emotional language changes between the beginning, middle, and end.

Figure 2
Figure 2.Output from Syuzhet Package –Single Text

Figure 3 drills down on these results and shows the overall occurrence of frequently used trust terms. Some terms are specific to the domain of the texts studied. For example, the terms religion, guru, holiness, and enlightenment likely occur more frequently than expected in a typical text due to the subject. These have been removed from Figure 2.

Figure 3
Figure 3.Output Syuzhet – All Texts Trust Terms

Discussion, Limitations, and Conclusion

This work explored the sentiment of frequently used terms in peace artifacts by the Dalai Lama. These keywords may point to a common language of peace. More work must confirm this finding, such as applying the terms as a predictive tool to labeled texts. If generalizations about the terms being used consistently in supportive language can be made, it would strengthen the argument.

One possible source of error from this work is the assumption that the text written by the Dalai Lama is all supporting the Three Principles mentioned in the introduction. While there certainly must be some text that does not have emotional value in the works, there may also be quotes from other authors or dialogs that do not represent the cultivation of compassion or are used as examples of what not to do. The methods discussed in this paper are not sufficient to detect things like negation, the use of the negative before a statement that reverses the meaning. There was also no extensive data sampling to ensure that the author of the texts analyzed was the Dalai Lama. However, even with these caveats, the result is fascinating.

This exploratory analysis demonstrates that an effort which was more controlled and expanded the same methodology may be able to identify terms which are commonly used in collaboration. This knowledge would then be a first step in understanding the nature of help and potentially a framework to inform its improvement. The likelihood of finding that greeting card, or of having Artificial Intelligence create it, will be more likely to occur. This work may also help inform conversational support from AI, similar to efforts like Woebot (Fitzpatrick et al., 2017).

Students may be interested in the computational operations described in the study and enjoy the opportunity to try running the scripts themselves. The students do not have to understand the complexity of correlation analysis or term frequency and topic modeling to get value from this type of work. It also serves as a starting point for those wanting to take further steps.

Other works by authors are accessible using Project Gutenberg, which provides digitized texts for free online. It is also possible to work with HathiTrust Research Center to download collections of works, such as those shown here, for analysis. Possible projects involving these types of efforts are uncountable, but some have looked at concepts like what is presented here. Other similar services are offered by Google, including books.google.com.

It can be intimidating to roll out new software in the classroom, and there are many possible failures, not limited to errors with the hosting site, code syntax, and end-user errors. What is essential, however, is that these tools are becoming more entwined with daily life all the time. The user interfaces are easier to use and produce more accurate results. Students also are more comfortable with technology, including its limitations.

While demonstrating the technique, such as is shown here, to analyze the sentiment of works by the Dalai Lama, be sure to invite criticism. At each step, many opportunities exist to point out potential bias or misclassification. Students (and instructors) find it enjoyable to poke holes in the latest innovation. As new advances are made, there will surely be many more chances to criticize. However, if AI is explainable, inspiring others with what is possible and reinforcing the good can occur. It may even help explain abstract concepts or provide a road map for how to get to impossible places.