Passa ai contenuti principali

Making Max Weber's U-turn visible with R

I was really convinced I had to apply my new knowledge about text analysis with R and Python on the Master's text, i.e. Hegel. But once you have started studying Max Weber, it is hard to stop. I have written my Master's thesis about his philosophy of history and in my doctoral thesis I treated his language ... and produced a book of 500 pages nobody will ever read entirely. If interested, and capable of reading in German, you could have a look at the chapters abut his youth letters, his epistemological writings, his "Protestant Ethics" or about his famous speeches "Science as Vocation" and "Politics as Vocation"

A summary in English is here, something about Weber on holiday here

My question was: How did Max Weber manage becoming a classic, a founding father of sociology? Still, this might be an interesting topic. If we had a look at the "Inaugural Speech" from 1895 of 31 years old Weber as a professor of National Economics, we might wonder how this aggressive racist may have reached the summits of theoretical thought. He asserts that Polish peasants were "ready to eat the grass off the ground" and  talks about the "Slavic race" and its qualities. Politically, his speech is directed against the Prussian nobility, which increases profits by worsening the working conditions for farmworkers, but the Liberal intention is masked by his invectives against Poles. 

Reading Weber's second famous speech about Politics, from 1919, we might wonder about the difference of tone. But this first impression ought to be confirmed by analysis. I use the R quanteda (Benoit et.al.) and the textnets package (code below) in order to get some graphical representations.  First, wordclouds may be helpful instruments of comparison (frequencies > 20/45). 
1895, we get



Everything is focused on the German nation, using words like "Deutschtum" ("Germanness", four times). In 1917, the cloud looks differently: 
He is talking about politics in general, it seems, not about Germany alone. Well, his fatherland remains important. What indeed changes is the entire geography around. If we consider co-occurences of country names, the turn becomes visible. In the Inaugural speech from 1895, we get

In 1895, Weber is looking east. In 1919, his world has changed:
#

With this turn, he is positioning himself far way from extreme right wing philosophers like Oswald Spengler as well as from Marxist thinkers like Ernst Bloch. On both sides, the slogan "Ex oriente lux" was proclaimed. Weber has gone West instead. Co-occurences between some keywords evidence what that means. 
1895:
1919:

But when, between 1895 and 1919, did Weber change direction of view? In 1904, could be a good guess. 

In 1904, Weber had travelled in the US quite extensively. But this cannot be the turning point. First, among the observations in his letters from the States there are several critical comments about race discrimination. The Weber from 1895 would presumably have been more tolerant with similar traditions. He had witnessed the German treatment of Polish immigrants and inhabitants of Prussia without giving notice of moral considerations.


Secondly, having a look at the co-occurrences in the first part of “The Spirit of Capitalism and the Protestant Ethics”, the one Weber published before travelling to the States, we already notice something new: 

The East is far less important, although America seemingly does not dominate yet. Or does it? Weber in this first part extensively discusses a text from Benjamin Franklin. Having added the name “Franklin” in the keyword list, indeed the overall impression changes with the most frequent co-occurences:



In this first part of the “Protestant Ethics” the name of Benjamin Franklin appears ten times.


For example, the most frequent bigrams are:

> textstat_frequency(ngram_dfm, n=20)

feature frequency rank docfreq

1 geist_kapitalismus 14 1 1

2 modernen_kapitalismus 8 2 1

3 ganz_ebenso 7 3 1

4 benjamin_franklins 7 3 1


Furthermore, “Amerika” is named only once, but its derivations (KWIC search “amerika*”: “amerikanisch” …) nine times. Summing these occurrences up with those of “Franklin”, we see: Weber already is in America, without having travelled yet.


In the text, we also find a new geographic indication: “Okzident*” appears 43 times. Integrating “Okzident” into the list of countries, the panorama looks like this:



In 1904, before travelling to the States, Weber has made his U-turn from East to West. When 
and where? Maybe in the dark years of his crisis when he was reportedly incapable of working and sat in the garden staring into the void. Or it happened around 1903 while he was studying Epistemology, long and dry (Rickert!) treatises about human knowledge. During this period, he indeed revisited the logic of concept construction within the Historical sciences. Not classes, but types! This turn had eliminated any possible bridge to racism for Weber. In both cases, the turn is computed in the lone spaces of reflection, not while travelling somewhere.



Other statistics

The differences in style between the speech from 1895 and the one from 1919 are not really overwhelming. With 31 tokens per sentence against 41, Weber’s sentences are shorter in 1919, but still rather long. The punctuation differs slightly. In 1895, we encounter a colon every 11 full stops, in 1919 every 5.6: The latter text could be rhetorically more adeguate for a spoken text. As for the lexical richness, both texts, being written by one and the same fully educated author, are quite similar, with a TTR of .26 in the shorter speech from 1895, and .23

in the other.


What really changes are the words most frequently used.

“Politics as vocation” (Ok, I could have stemmed them …):

> textstat_frequency(komplett_dfm, n=20)

feature frequency rank docfreq group

1 politik 70 1 1 all

2 politischen 58 2 1 all

3 macht 57 3 1 all

4 ganz 45 4 1 all

5 parteien 38 5 1 all

6 mittel 33 6 1 all

7 partei 31 7 1 all

8 politiker 29 8 1 all

9 ethik 28 9 1 all

10 politische 26 10 1 all

11 immer 25 11 1 all

12 sache 24 12 1 all

13 herrschaft 24 12 1 all

14 sinn 24 12 1 all

15 leben 24 12 1 all

16 rein 22 16 1 all

17 mehr 22 16 1 all

18 sagen 21 18 1 all

19 gerade 21 18 1 all

20 welt 21 18 1 all


Looking for bigrams:

> ngram_dfm <- dfm(toks_ngram)

> textstat_frequency(ngram_dfm, n=20)

feature frequency rank docfreq

1 politik_beruf 7 1 1

2 politik_treiben 5 2 1

3 örtlichen_honoratioren 5 2 1

4 politischen_macht 4 4 1

5 politik_leben 4 4 1

6 betrieb_politik 4 4 1

7 vereinigten_staaten 4 4 1

8 beruf_politik 4 4 1

9 bürgerlichen_parteien 4 4 1

10 spoil_system 4 4 1


In the „Antrittsrede“:

> textstat_frequency(komplett_dfm, n=20)

feature frequency rank docfreq group

1 nation 33 1 1 all

2 deutschen 24 2 1 all

3 unserer 23 3 1 all

4 ökonomischen 22 4 1 all

5 politischen 18 5 1 all

6 ökonomische 18 5 1 all

7 heute 16 7 1 all

8 gerade 15 8 1 all

9 eigenen 15 8 1 all

10 zukunft 14 10 1 all

11 politische 14 10 1 all

12 allein 13 12 1 all

13 arbeit 13 12 1 all

14 wissenschaft 13 12 1 all

15 sozialen 12 15 1 all

The presence of economy is quite evident here. Remember, Weber has begun his career as a Professor of Economics. In bi-grams:

> textstat_frequency(ngram_dfm, n=20)

feature frequency rank docfreq group

1 ökonomische_entwicklung 4 1 1 all

2 physischen_psychischen 3 2 1 all

3 letzter_linie 3 2 1 all

4 ökonomischen_sozialen 3 2 1 all

5 unserer_wissenschaft 3 2 1 all

6 frieden_menschenglück 3 2 1 all

7 machtinteressen_nation 3 2 1 all

8 leitung_nation 3 2 1 all

9 politi-_schen 3 2 1 all

10 deutschen_bürgertums 3 2 1 all

11 politischer_erziehung 3 2 1 all

12 1871-1885_zeigt 2 12 1 all

13 kampf_dasein 2 12 1 all






technically

library(quanteda)
library(quanteda.textplots)
library(tidyverse)
library(devtools)
library(textnets)
library(readtext)
theme_set(theme_bw())

#setwd("desktop/WeberR")

komplett_PS <- readtext("Pberuf.txt", text_field = "texts")

komplett_PS_c <- corpus(komplett_PS, text_field = "text")
names(komplett_PS_c)
tail(komplett_PS_c)

#following http://inhaltsanalyse-mit-r.de/netzwerke.html

meine.dfm <- dfm(komplett_PS_c, remove_numbers = TRUE, remove_punct = TRUE, remove_symbols = TRUE, tolower = F, ngrams = 2)
laender <- scan("laendernamen.txt", what = "char", sep = "\n", quiet = T)

meine.dfm.laender <- dfm_select(meine.dfm, laender, selection = "keep", case_insensitive = F)
meine.fcm <- fcm(meine.dfm.laender)
textplot_network(meine.fcm)



library(seededlda)

library(quanteda)

library(quanteda.textstats)

library(SnowballC)

library(readtext)

library("wordcloud")

library(koRpus)

library(stopwords)

library(scales)

library(tidyverse)

#Gesammelte Politische Schriften - Universität Potsdamhttps://verlagsarchivweb.ub.unipotsdam.

de

#following https://data.library.virginia.edu/reading-pdf-files-into-r-for-text-mining/

#data_char_HegelPhWallace <-

texts(readtext("https://www.gutenberg.org/files/39064/39064-0.txt"))

#names(data_char_HegelPhWallace) <- "Philosophy of Mind"

getwd()

#setwd("desktop/WeberR")

komplett_PS <- readtext("WeberPE1.txt", text_field = "texts")

names(komplett_PS)

length(komplett_PS)

summary(komplett_PS)

head(komplett_PS)

komplett_PS_c <- corpus(komplett_PS, text_field = "text")

names(komplett_PS_c)

tail(komplett_PS_c)

summary(komplett_PS_c)

head(komplett_PS_c)

ntoken(komplett_PS_c)%>%

sum()

ntype(komplett_PS_c)%>%

sum()

kontext1 <- kwic(quanteda::tokens(komplett_PS_c), "Franklin", case_insensitive = TRUE,

valuetype = "glob", window = 10)

kontext2 <- kwic(quanteda::tokens(komplett_PS_c), pattern= "Okzident*", case_insensitive

= TRUE, window=10)

kontext3 <- kwic(quanteda::tokens(komplett_PS_c), pattern= "amerika*", case_insensitive =

TRUE, window=10)

kontext4 <- kwic(quanteda::tokens(komplett_PS_c), pattern= "Rasse*", case_insensitive =

TRUE, window=10)

komplett_tokens <- quanteda::tokens(komplett_PS_c,

remove_punct = TRUE,

remove_symbols = FALSE,

remove_numbers = TRUE)%>%

tokens_select(min_nchar=4)%>%

tokens_remove(stopwords("de"))

head(komplett_tokens)

# Create a word cloud in red with min frequency of 20

dev.new(width=5, height=4, unit="in")

plot(1:20)

dev.new(width = 550, height = 330, unit = "px")

plot(1:15)

wordcloud(komplett_tokens, min.freq = 45, colors = "red",

scale = c(2,2.11),random.order = TRUE)

komplett_dfm <- dfm(komplett_tokens)

length(komplett_tokens)

length(unique(komplett_tokens))

textstat_frequency(komplett_dfm, n=20)

toks_ngram <- tokens_ngrams(komplett_tokens, n = 2:4)

ngram_dfm <- dfm(toks_ngram)

textstat_frequency(ngram_dfm, n=20)

#for (k in 3:17) {

#fund_lda <- textmodel_lda(komplett_dfm, k)

#print(terms(fund_lda, 10))

#}



Commenti

Post popolari in questo blog

Hegel and the stopwords

It would be interesting working in ML with texts of a certain quality, for example literary  and philosophical works. It would be even better, only dreaming: If we could make the  machine understand and practice dialectical thinking. Being is the contrary of nothing, but  it is also the same … But this, well, first we have to get the texts into the right format. Hegel wrote in German, but most of the scholars who read his books, somebody does! will  read them in translation, mostly in English, maybe in Chinese. We know translating philosophical texts is an arduous task . We know many of these  transpositions in other languages are misleading. As they should mirror both the ideas and  the language, often they give neither. But judging is not my task here. When we talk about  Hegel or dialectical thinking as an existing thing, we are talking about something not a priori definable by “original texts” or “author’s intentions”. Rather, we have to refer to w...