We know how: Machines scratch every single existing website, collecting words, word embeddings and word positions. But we also know: Sixty or ninety nine percent of the material collected in giant databases is rubbish. Badly chosen words in wrong places, expressing things not true, stupid ideas. Could we not start with reasonable material? No journalistic stuff, no restaurant reviews, but texts from philosophers? Literary writings? .