Why to mine (but leave the pickax at home)


Everyone is into data mining these days. Retailers find patterns in what you buy so that they can better market to you, governments search for patterns that identify terrorist threats, and data mining is at the core of privacy debates about the quarry of information collected by Facebook, Google and other websites. But the news isn’t all potentially nefarious or the subject of impending litigation. Data mining is simply a technique for finding patterns in sets of data. As we are all in the pattern-extraction business, we should also be miners.

the project: use emergent technologies of text mining on Greek drama and epic to 1. test whether such techniques can help us discover anything new about ancient speech and speaker status in each corpus 2. see how such methods fare against existing studies of Homeric and tragic speech and language and against my own investigation of tragic vocal virtuosity and characterization through the micro-level analysis of turns of speech; in other words, what’s the difference between the atom smasher (mining) and the microscope (let’s call it philology)? 3. test whether such techniques can be used cross-generically (for tragic inheritances/echoes/evocations of Homeric epic) and, if possible, identify problems and possible solutions for wide-scale use of text mining across multiple genres of Greek literature.

This is still just gearing up with texts in humanities. I’ll post results here as I get them, but for background, check out the relatively user-friendly presentation from the ARTFL folks who have been developing the mining tools that will be used in this project. For Greek, check out work already underway by Helma Dik at UChicago. (See especially Dik and Whaling 2008 on some initial experiments for gender marking in tragedy.)

Among the more interesting outcomes from their initial tests was that Athena was frequently misclassified– that is, she didn’t quite fit the pattern for female characters. Dik and Whaling: “such a finding would suggest (if not for the first time) that authority and gender intersect in important ways.” More interesting to me is the finding (not detailed there) that Peleus in Andromache was classified as female in these sorts of gender tests. Indeed, he spends the second half of the play lamenting like a female lead; but it also should prompt us to reconsider Peleus’ claims to his own manliness. 757: “No more women’s pathetic speech!” and 762-3: “Even an old man, if he’s brave, is stronger than many young men.” If the lexical and syntactic gender markers are indeed something that native speakers would pick up, then perhaps the lady doth protest too much.

coming up next: not everyone gets to be a hero