Theory

Text mining, Hermione Granger, and fan fiction: What's in a name?

Rebecca Rowe

University of Texas Rio Grande Valley, Brownsville, Texas, United States

Tolonda Henderson and Tianyu Wang

University of Connecticut, Storrs, Connecticut, United States

[0.1] Abstract—When fans rewrite characters, how do they engage that character's identity and the social constructions around it? Fan fiction writers resist, replicate, and create oppressive social systems by changing characters between published and fan texts. As such, fan studies scholars have long been interested in how fans construct characters, an interest that has often been paired with readings of race, gender, and sexuality. Digital humanities can help confirm and nuance extant fan studies scholarship around specific characters popular in fan fiction. We used Word2Vec software to mine the text of 450 pieces of fan fiction based on J. K. Rowling's Harry Potter series. By focusing on the depiction of Hermione Granger in both Rowling's novels and Harry Potter fan fiction, we tested how text mining character names can reveal properties closely tied to a specific character through the relationships between the target name and other characters. Analysis via Word2Vec found that "Hermione" is used grammatically and contextually differently in the books (in which she is most like Harry and Ron) than in our fan fiction corpus (in which she is most like other girls/women). This difference suggests that these fans have a specific reading of Hermione that is communally understood even if Rowling's diction offers a different reading.

[0.2] Keywords—Digital humanities; Fan fiction characters; Harry Potter; Word2Vec

Rowe, Rebecca, Tolonda Henderson, and Tianyu Wang. 2021. "Text Mining, Hermione Granger, and Fan Fiction: What's in a Name?" Transformative Works and Cultures, no. 36. https://doi.org/10.3983/twc.2021.1997.

1. Introduction

[1.1] What's in a name? Names help people craft their individual identities. There is a reason why we introduce ourselves by our names, why we have nicknames that seem to better capture our essence, why transgender people often choose new names for their new lives: we tie names to the idea of who we are. Names also reveal how we are connected to others: family names link us to biological and created families, and names often reference larger identity markers such as gender, race, and nationality. Character names can have just as much power; everything we understand about a specific character, we link to the name. Moreover, names often structure fan production, especially in shipping. Fans will tag their productions with "Harry/Draco" or "Remus/Sirius," and shippers stake their position through claims such as "I ship Harry and Hermione" or "I'm a Ron/Hermione shipper." Characters are central to much fan production, and this production is organized through names.

[1.2] Scholars have used close reading of individual fan fiction stories to examine how fans write characters. Close reading is useful and exposes many ways that fans engage with both fictional and real worlds. At the same time, distant reading through digital humanities (DH) techniques such as text mining—the transformation of text into analyzable data that visualizes verbal patterns underlying said text—can ask and answer very different questions (note 1). DH "asks what it means to be a human being in the networked information age and to participate in fluid communities of practice, asking and answering research questions that cannot be reduced to a single genre, medium, discipline, or institution" (Burdick et al. 2012, vii). DH allows scholars to use digital tools to address humanistic questions about our lives in a digital age and to record that knowledge digitally and publicly, thereby engaging in our increasingly digital culture. Abigail De Kosnik (2016) argues that we can use DH methods to try to answer the biggest questions in fan studies because DH methods allow us to navigate significantly larger pieces of fan life and to look at fan culture as a whole rather than in bits and pieces, as we so often must. DH has tools that scholars can use to study enormous volumes of fan fiction, far beyond the capability of any given human reader.

[1.3] Here we demonstrate how DH can help confirm and nuance extant fan studies scholarship about how fan fiction writers write characters by using Word2Vec to text mine 450 pieces of fan fiction based on J. K. Rowling's Harry Potter series. We begin by grounding our use of character in scholarship on literary characters and how fan fiction writers write them, explaining how and why we use Word2Vec to study names. We then outline our methodology in creating this project, including how we approached the ethical concerns of working with fan texts in such large quantities. Next, we offer an analysis of Hermione Granger, comparing our findings to close readings of the character in both the Harry Potter novels and their fan fiction. This comparison highlights how text mining can reveal how fans use existing social ideologies. Using Word2Vec, we found that fan fiction writers highlight Hermione's femininity more than Rowling does, suggesting that these fans have a specific reading of Hermione that is communally understood even if Rowling's diction offers a different reading. Finally, in our conclusion, we demonstrate how this kind of work can respond to recent concerns around Rowling's transphobic beliefs and fandom's racism.

2. Character, fan studies, and text mining

[2.1] Before we venture much further into this project, we think it would be useful to explain how we understand characters, especially characters created by fans, and how text mining can assess characters. Character is "a text- or media-based figure in a storyworld, usually human or human-like" (Jannidis 2013, § 1). Fan fiction writers most consistently focus their stories on recreating characters or story worlds, so we decided to pursue how fans rescript specific characters. Deborah Kaplan was perhaps the first scholar to explore how fan fiction characters differ from characters that appear in more traditional fiction. She argues that "although a work of fan fiction might otherwise follow the conventions of original fiction character development, it must also be in constant dialogue with the source text's characters, already fully realized and well known to the story's reader" (2006, 136). Kristina Busse elaborates that "this process of altering a character enough to fit the story, but not too far to become unrecognizable, is at the heart of an interpretive community's acceptance of a characterization" (2017, 113). Fan fiction, in this light, is an interpretation of a character, one that is influenced not just by the original author but by other fandom-accepted conceptions of the character. These characters are negotiated constructions that exist between the author, the fan fiction writer, and the reader who experiences both the published text and the fan fiction. We can then use text mining to study such constructions because, as some scholars argue, characters are "mere words or…paradigm[s] of traits described by words" (Jannidis 2013, § 3.1). Text mining can look for patterns in those words to determine how characters are portrayed at the level of diction, syntax, and context.

[2.2] For this project, we used an algorithm called Word2Vec to discover which words are closest to a target word. Word2Vec, originally developed in 2013 by a team of Google researchers led by Tomas Mikolov, does this by examining the words that appear around every word in a corpus and then establishing vectors for each word. These vectors then exist within a three-hundred-dimensional vector space so that we can analyze which words' vectors appear most similar to each other. We recognize that literary characters go beyond the literal meanings of the words around them as readers add their own understandings and interpretations to the text on the page, which is how so much fan fiction is born. However, character names do act as anchors for the character and their construction within the text itself. Roland Barthes argues that in fiction, the "proper name acts as a magnetic field for the semes" (the character's underlying themes and qualities) and "allows the substitution of a nominal unit for a collection of characteristics by establishing an equivalent relationship between sign and sum: it is a book-keeping method in which, the price being equal, condensed merchandise is preferable to voluminous merchandise" (1974, 67, 95). In other words, a proper name represents everything that we know about the character, acting as a stand-in (the condensed merchandise) for all the modifiers that we would use to describe such a character (the voluminous merchandise). Word2Vec allows us to analyze the semantic field around names to understand a character's characterization, highlighting how authors use basic linguistic choices to build their characters while excluding the outside connections that readers make. For example, readers may bring their own associations with smart young women to their readings of Hermione, adding layers of meaning that do not exist in the text. Word2Vec analyzes the text without readers' outside connotations.

[2.3] Importantly, Word2Vec does not simply spit out a list of the traits associated with any given word. Rather, it reveals words that are most similar, both grammatically and contextually, to a target word. As Florian Heimerl and Michael Gleicher explain, "Word vector embeddings place words as points in a vector space. The positions are designed to encode meaning, based on the concept of distributional semantics—the assumption that words that appear in similar contexts are close in meaning" (2018, 254). Andrew Piper argues that this "concept of 'distributional semantics'…lies at the center of computational linguistics today," and he suggests that the assumptions that underlie this work are: "a) a word's meaning is tied to how often it occurs; b) a word's meaning is tied to how often it occurs with other words in a given context; c) these relationships are entirely contingent upon the scale of analysis; d) and these relationships can be rendered spatially to capture semantic associations between them" (2018, 13). This spatial rendering comes in the guise of the aforementioned vectors. Because a vector has a direction, the relationships between two-word vectors can be calculated on the basis of the angle the vectors make to each other. As Piper explains, the "more the frequencies differ…the more spatially dissimilar the semantic representations will appear to be from one another" (17). In simpler terms, vector similarity is represented by a decimal number between 0 and 1 that we call the similarity index; the larger the number, the more similar the word is to the target word.

[2.4] For the most part, scholars have used Word2Vec to measure how word meanings change over time in relation to other words (Hamilton, Leskovec, and Jurafsky 2016). For example, researchers could visualize how the word "gay" has related to different words over time, from words like "daft" in the 1900s and "bright" in the 1950s to "homosexual" in the 1990s (Hamilton, Leskovec, and Jurafsky 2016). The same approach is possible with names; we can see how the contexts of specific characters' names differ between books and fan fiction. Word2Vec focuses on both the grammatical and contextual uses of a word, so it will examine if characters' names are used grammatically more like a noun or a verb, as well as what other words most often appear around a specific name. If "Harry" and "Hagrid" are both used as nouns and appear around words like "wizard," "he," "boy," "Dumbeldore," and "Hogwarts," then Word2Vec registers these words as similar. One could argue that names do not have synonyms; after all, you cannot replace "Harry" with "Hagrid" and keep roughly the same meaning. However, Uri Margolin argues that characters exist on "shared semantic axes" and that we can compare characters depending on where they fall on such axes (1995, 385). Essentially, if we are interested in gender, Harry is more similar to Dumbledore than he is to Hermione, but if we are interested in age, Harry is more similar to Hermione than he is to Dumbledore. Word2Vec effectively measures three hundred different grammar and context axes to try to understand what words are most similar to "Harry," determining which axes are most important to the way the text represents each characters' name.

[2.5] So far, little work has been done using Word2Vec to study names. One study, "Novel2Vec: Characterising 19th Century Fiction via Word Embeddings," tested if Word2Vec could be used on names. By studying works by Jane Austen, Charles Dickens, and Arthur Conan Doyle, Siobhán Grayson et al. found that "syntactically, character vectors are very distinguishable from other grammatical categories of words within each novel2vec dataset" (2016, 78). In other words, for these three authors, character names simply work differently than other words, suggesting that we can use Word2Vec to study the ways that characters relate to each other. However, they also found that Austen's characters worked drastically differently than Dickens's and Doyle's (78), suggesting that character names are not consistent across authors. In other words, unsurprisingly, authors use names contextually and grammatically differently, meaning that comparisons across works can reveal differences in how authors understand their characters. It is these differences we trace between the Harry Potter novels and Harry Potter fan fiction, examining how Rowling and fan fiction writers emphasize different elements of their characters' semantic axes.

3. Methodology

[3.1] Because text mining needs to engage with corpora created under certain parameters, this project focuses specifically on the fandom around Rowling's Harry Potter series. We use this fandom for two primary reasons. First, as scholars have noted, "the most representative Internet fandom of the 2000s is surely the Harry Potter fan community" (De Kosnik 2016, 325). This fandom, and its fans, blossomed with the advent of the personal internet such that many current fandom practices stem from this one fandom (note 2). The sheer size of its production is one of the main reasons that the Harry Potter fandom is representative and that it works so well for text mining: as of April 2020, the Harry Potter fandom has the most fan fiction for any one book text on the two largest fan fiction archives, Archive of Our Own (AO3; https://archiveofourown.org/) and FanFiction.net (https://fanfiction.net/).

[3.2] The second reason this fandom is productive for this study is that names were important to the way Rowling designed her characters. She claimed in an interview, "I love names, as anyone who has read the books is going to see only too clearly" (quoted in Dresang 2002, 212). She carefully constructed her characters' names, sometimes using them to give hints to character identities—such as Remus Lupin ("Remus" from a Roman story about a boy raised by wolves and "Lupin" derived from "lupine," meaning wolflike), who is a werewolf—and sometimes using them to cue a change in identity—such as Tom Riddle's shift to Lord Voldemort. Harry Potter is a good test case in how character names act as semantic magnetic fields and how those fields shift between original text and fan fiction because names are so central to how these characters are constructed and portrayed in the published text.

[3.3] We built two sets of corpora, one by Rowling and one by fans. Building the Rowling-penned corpus seems like it would be simple, but delineating what counts as part of the Harry Potter canon is complicated because it continues to change over time. Fans and scholars alike debate whether the canon should include only the original seven novels or whether it should also include some variation of the following elements: the original film adaptations, Rowling's continued supply of information about the wizarding world on Pottermore, the textbooks and other tangential books Rowling has published, the new Fantastic Beasts films, and the stage play Harry Potter and the Cursed Child (2016). Rowling has had some hand in each of these endeavors, but they often contradict one another, creating a complex canon that is difficult to navigate, and many had coauthors and entire teams further affecting the text. For this project, we decided to focus on only the seven original books because (1) they constitute the most frequently agreed-on portion of the canon and are the texts fan fiction writers are most likely to respond to; (2) they consist of written story prose that has been digitized, much like fan fiction (whereas the movies and the play were intended to be performed, and Rowling's outside information is most often not written in story form), and so gives the clearest comparison. From these seven novels, we built one corpus that is 1,084,170 words long.

[3.4] To create our fan fiction corpus, we decided to focus on one fan fiction archive. We chose AO3 because out of all the most popular fan fiction archives, it is the one that is most clearly organized to ease the navigation of the millions of individual fan fiction stories, although we recognize that Harry Potter fan fiction practices were established long before AO3 came into existence in 2009. Most archives, like FanFiction.net, have some ways that allow readers to search the archive and narrow that search (FanFiction.net has seven filters), but AO3 has significantly more (eighteen filters, ten sort options). AO3 grants the researcher the most control in sorting and filtering fan fiction metadata, which in turn facilitates corpus construction. This means that it can most easily be used for both fan studies and DH.

[3.5] For this project, we selected all Harry Potter fan fiction on AO3, then narrowed it down calendar year by calendar year, starting in 2009 and ending in 2017 (the last year to have most of its fan fiction register a completed status at the time of data collection). Within each year, we then sorted by hits (which AO3 defines as "the number of times your work has been accessed" [Archive of Our Own n.d.]), choosing the fifty fan fiction stories in each calendar year with the most hits. We chose hits as a sorting measure because we wanted to find the pieces that had been viewed by the most people and thus may have had the largest impact on other fans. These fan fiction stories' popularity may suggest that they have some shared features because they were the ones most popular within the community and therefore the ones most likely to influence later fan fiction. Their likely similarity allows us to study trends in what is a relatively small corpus for a DH study. Future studies with a larger corpus may be able to find patterns among more diverse stories. To counteract the similarity suggested by the stories' popularity, we gathered stories that were diverse in their content rating, sexual orientation, and word length, although we did limit our study to stories published in English. Moreover, because we are interested in characters, we included fan fiction that features mostly Harry Potter characters, no matter what fictional universe they were located in, but did not include fan fiction where other fandoms' characters were featured in the Harry Potter universe without any of Rowling's original characters. After choosing fifty fan fiction stories with the most hits from each of nine separate years, we included 450 fan fiction stories in our corpus, containing 28,972,363 words.

[3.6] Once we created our corpora, we developed the text mining tools. Using existing Word2Vec templates, author Tianyu Wang trained two models based in each of our corpora. Wang then eliminated the stop words—high-frequency function words such as "a," "of," and "the" that do not pertain to the formation of character we are studying—from each corpus. These models were then trained using a three-word window for a hyperparameter: vectors were calculated on the basis of an examination of which words tended to appear in the three words before or after the target word. We used a Skip-Gram model—which "predicts a target context for a given word"—rather than a Continuous Bag of Words—which "predicts words on the basis of the context in which they occur"—because Grayson et al. found that Skip-Gram "captur[es] the nuances of" names "better than CBOW [Continuous Bag of Words] models" (2016, 71, 75). Word2Vec looks for how a word is used in a sentence and what words tend to appear around it. Word2Vec then provides lists of words that are used most grammatically similarly and have the most similar context to a target word.

[3.7] While designing this project, we were especially aware of the ethical issues pertaining to working with such a large body of fan-produced texts. Brittany Kelley has argued that analyzing fan fiction has different privacy concerns from traditionally published fiction because "fan writers are often not protected from censure" (2016, ¶ 1.5). Also, as Busse and Hellekson argue, "Fan publications…are perceived as existing in a closed, private space even though they may be publicly available" (2012, 39). Fans have not written their fiction specifically for our analysis, and they may be uncomfortable with their work being exposed in a different context than that in which they published it. Kelley suggests that, when feasible, a scholar may request permission from fan writers to use their work to ensure that they are protecting writers' privacy and comfort. However, she also acknowledges that "either requiring each author's consent to discuss a story or speaking of stories only in the aggregate is not only largely untenable but potentially disruptive, and, what's more, not necessarily in the best interest of either scholarship or fandom" (2016, ¶ 3.2). Even though we are working with this fan fiction "in the aggregate," we vacillated between protecting the privacy of fans and acknowledging whose work we used.

[3.8] To address these methodological issues, we built a website (https://researchinghpfanfi.wixsite.com/analyzefanfic) for two different audiences. First, we addressed the scholarly community: one of the major methodological issues in DH today is that studies like ours, which draw on large numbers of texts, do not always provide information about what texts they use, so other scholars cannot check the work or develop further work from a particular data collection (Bode 2017, 85). We cannot continue to push our knowledge of fans further if we constantly have to create and recreate data sets because we are unwilling to share or incapable of sharing our data. To ameliorate this issue, our website includes the citations of all 450 pieces of fan fiction, the Harry Potter novels, and the data that we collected so that scholars can critique and/or continue this work.

[3.9] Our second audience is the fan community. While we do not wish to expose fans' work in unauthorized spaces, we also believe that it is unethical to mask the work that these fans have produced. The citations on the website thus also serve to acknowledge and recognize all the fans whose creative work influenced our project. We also believe, along with Kelley (2016), that people outside the scholarly community, especially the fans who inspired and were the basis of this project, should be able to access this work. Fans should be able to see how their writing and their communities are analyzed and how we talk about them. Thus, the website also allows users to create and manipulate visualizations that are based on the data we collected on seven Harry Potter character names (Harry, Ron, Hermione, Draco, Sirius, Voldemort, and Dumbledore) so that the users can find their own patterns. DH projects like this one are public facing and are thus accessible to everyone, both in that such projects exist in public forums and in that they are navigable by general audiences. Fan communities are largely online (note 3), and this kind of DH work offers us the opportunity to reach out to them and show them what we see and how we see it. In turn, it allows them to respond and further shape our research. Any and all analysis derived from such work then belongs to the user and can be used and/or published as they may desire. We hope this will encourage more DH work in fan studies, but also that it will allow for more work with the data already collected.

4. Results

[4.1] As with many DH projects, we ran Word2Vec with little idea of what we would actually find. We were not sure which names would reveal the most fruitful patterns, so we ran seven names: Harry, Ron, Hermione, Draco, Sirius, Voldemort, and Dumbledore. Each name revealed something interesting—for example, Dumbledore in the books was most like other professors' names, while Voldemort was the character name most like nonname nouns in both books and fan fiction, suggesting he is more like things than other characters. However, to demonstrate the usefulness of text mining in the study of fan fiction characters, we decided to focus our analysis on one character: Hermione. We chose Hermione because (1) she is one of the central characters in the books; (2) she is "a favourite with writers of Harry Potter fan fiction" (Altintaş 2013, 157); and (3) our analysis of her, more than the other character names we examined, allows us to demonstrate how Word2Vec can reveal how different vectors of identity (in this case gender) are emphasized differently in the books and fan fiction. Eliza T. Dresang has already noted the importance of Hermione's name, arguing that "granting this character a distinguished literary tie through her uncommon name whose source Rowling cites as Shakespeare's A Winter's Tale gives her the legitimacy and strength among her peers that the main male characters gain either out of heredity (Ron) or endowment (Harry)" (2002, 212). While studies like Dresang's rely on the outside knowledge that readers bring to such a name, Word2Vec examines how the name is used within the text. The name Hermione in the books operates similarly to both Harry and Ron, demonstrating how Harry's perspective, as the primary focalizer of the series, impacts our understanding of one of his closest friends. On the other hand, in fan fiction, Hermione is used most similarly to other girls' names. This focus on gender may be due to the romantic nature of much fan fiction. In this section, we pair our reading of the Word2Vec analysis of Hermione with close readings that scholars have already done to demonstrate how close reading and text mining can work together.

[4.2] Scholarly close readings of Hermione often focus on a few key characteristics and return to the same passages and themes to prove their points. According to such close readings, Hermione is intelligent (Armstrong 2015; Berents 2012; Dresang 2002; Foster 2012; Klingbiel 2012; Taylor 2014), which is emphasized as early as her first meeting with Harry, when she has already read all the schoolbooks; her decision in her third year to take so many classes that she has to time travel to keep up with them all; and, of course, Remus Lupin's claim that Hermione is "the brightest witch of your age" (a quote from the third film that scholars often prefer to use rather than the clunkier "You're the cleverest witch of your age I've ever met," from the novel) (Cuarón 2004; Rowling 1999, 346). Scholars also point out that Hermione is the moral center of the Golden Trio (Armstrong 2015; Berndt 2011; Gercama 2012; Taylor 2014; Thompson 2012), specifically pointing to Hermione's work with S.P.E.W. and to the way she guides the boys' moral growth (even while breaking almost every school rule). Some scholars even focus on Hermione's homely looks and how that saves Hermione from being a sexual object (Armstrong 2015; Berndt 2011; Cordova 2015), highlighted in moments, such as the Yule Ball in Goblet of Fire, in which Hermione is seen as pretty. This moment acts as the exception that proves the rule.

[4.3] Most importantly, while critics disagree whether the representation of Hermione is feminist (Armstrong 2015; Berents 2012; Berndt 2011; Gercama 2012; Taylor 2014; Thompson 2012) or sexist (Cordova 2015; Dresang 2002), they almost unerringly focus their analysis on her gender and how well or poorly it is portrayed. Perhaps because she is "the sole girl" (Berents 2012, 144) among the main characters, scholars read each of her traits through her gender. She is a brainy girl, she is ethical/moral because she is a girl and the mother figure of the group, her homely looks matter because they are not girly enough. This analysis of her girlness often comes down to Rowling's diction. For example, Dresang found it problematic that "Rowling has Hermione 'shriek,' 'squeak,' 'wail,' 'squeal,' and 'whimper,' verbs never applied to the male characters in the book," depicting Hermione as a stereotype of a tween girl (2002, 223), while Christine Klingbiel argues that such words simply reveal Hermione, as a girl, has a "greater emotional range" (2012, 177). Nonetheless, scholars seem to agree that Rowling's diction around Hermione always seems to come back to her gender.

[4.4] While scholars (along with many fans) constantly point to these qualities as central to Hermione's character, Word2Vec revealed that Hermione was not portrayed much differently than Harry and Ron. Harry, Ron, and Hermione share many of their most similar words in the book lists, as can be seen in table 1, which details the ten most similar words for each character in books and fan fiction. For example, according to Word2Vec's analysis of the three hundred vectors, "Hermione" is the word that is most similar to "Harry" in the books, while "Garvan" is most similar to "Harry" in the fan fiction corpus. As can be seen in the "Book" columns of table 1, in the books, Harry, Ron, and Hermione are most similar to each other's names, "Hagrid," "Neville," "Ginny," "quickly," "uh oh," and "whoa," all of which appear within each of the three character's top ten most similar words, suggesting that these character names are used similarly. Further down in Hermione's list, we see words like those that Dresang (2002) argues signal Rowling's stereotypical portrayal of girlhood, such as "anxiously" (29), "shrilly" (38), or "sheepishly" (41). However, nowhere on Hermione's list do any words appear that might hint at her intelligence, her morality, or her homeliness. The words on this list are the words most similar to Hermione in use and context, so we did not necessarily expect to see "smart" or "moral" on her list; rather, we expected to see other characters who were also seen to have these traits, like Dumbledore's intelligence or Molly's caring nature. Instead, Hermione's list offers no words or names that might hint at the characteristics most commonly linked to her.

Table 1. Ten most similar words for each character in books and fan fiction
RowHarryRonHermione
BookFan FictionBookFan FictionBookFan Fiction
1HermioneGarvanHermioneGinnyRonLuna
2RonAnaitaGinnyNevilleGinnyGinny
3MalfoyRosenthalNevilleBillNevilleGabrielle
4HagridMichelquicklyCharlieHarryNeville
5whoaTamsinHarryDeanquicklyFleur
6GinnySanguiniwhoaGrangerwhoaPansy
7NevilleGrapplefangMrs.SeamusHagridDaphne
8ChoTanithuh ohPercyuh ohDora
9quicklyJoshuaHagridTheoLunaLavender
10uh ohCamillaPercyDracocrestfallenPavarti

[4.5] Hermione's similarity to Ron and especially Harry is due to how the Golden Trio functions as a unit, with Harry at its center. We originally ran Word2Vec on just the Golden Trio, but when the results came back showing so much similarity, we decided to run more names to see if all names worked similarly for Rowling. They do not; Harry, Ron, and Hermione alone work as a single unit. Katrin Berndt argues that "Harry's restricted view and his personal attitude towards Ron and Hermione…determine the presentation of Hermione in the series" because "like the rest of the characters, she is only ever depicted from Harry's point of view and assessment" (2011, 161). As Melanie J. Cordova argues, "Viewing this as a constellation, Harry is the Sun and the women are planets at varying distances from him," with Hermione as the closest planet to Harry and thus the one who most clearly takes on his characteristics (2015, 22). In other words, because the events of the series are filtered through Harry's perspective, his perspective affects every character, like a star's gravity pulling planets to it. Because Hermione and Ron are so close to him, their semantic magnetic field is intertwined with his.

[4.6] Fan fiction, which is not consistently filtered through Harry's perspective, separates the identities between the three characters more. Harry is most similar to original characters that fans create, Ron is most similar to his family members, and, most importantly for our argument here, Hermione is most similar to other female characters. Word2Vec reveals how words are used grammatically and contextually, which means that the most similar words to a name in fan fiction are often names that are used similarly in writers' grammatical structures. They appear around similar nouns, verbs, and adjectives (i.e., they share similar semantic magnetic fields). We can almost understand these top names as networks into which fan fiction writers, as a whole, seem to put these characters; the characters' names are used in similar ways around similar words, suggesting that there is a similarity between the characters, a similarity that fan fiction writers highlight. Essentially, characters who appear most similar to our target names fulfill similar roles within the text as our chosen character. Beyond Harry's perspective, Hermione in fan fiction is similar to a whole new network of female characters. As rightmost column in table 1 shows, the top ten words most similar to Hermione in fan fiction are predominantly female names, mostly consisting of characters the same age as Hermione (although fan fiction writers may write those characters as tweens, teens, or adults of any age, depending on when they set their fiction).

[4.7] That other female names are the most similar to Hermione's seems to emphasize Hermione's femininity over anything else, perhaps because fan fiction often engages in the romance genre. Much fan fiction is formed around romantic and/or sexual relationships, or ships—so much so that "Relationships" is one of the sortable categories on AO3. Many scholars have studied the male/male slash ships so prevalent in fandom, but fans are also deeply invested in heterosexual relationships featuring female characters. Indeed, Hermione is part of four of the most common ships (Hermione/Ron, Hermione/Draco, Hermione/Snape, Hermione/Harry) for Harry Potter on AO3. In her own study of Hermione in fan fiction, Anne Kustritz argues that depictions of Hermione in one online mailing list dedicated to Hermione/Snape often reflect "tropes for the representation of heterosexual exchange whose omnipresence is apparent in their commonality not only among paperback romances like the Harlequin line, but also in 'high brow' depictions immortalized by Jane Austen and the Brontë sisters" (2015, 446). In other words, in her specific study, Kustritz found that many fan fiction stories focusing on Hermione were based in the romance genre and that they therefore portrayed Hermione in highly specific gender roles. Hermione's predominant similarity to other female characters might suggest that many prominent female characters in our fan fiction corpus fulfill a similar role within the romance plot. They are used similarly and in similar contexts in these romantic ship fics, so that their roles as romanceable characters is semantically more consistent to their character than any other aspect of their identity. For example, Pavarti's Indian ethnicity does not prohibit her from being similar to Hermione because the way their gender is expressed is more similar to how these characters are portrayed across the fan fiction, although we acknowledge that fans often racebend various characters, which may affect how race and ethnicity appear in distanced studies like this one.

[4.8] Our Word2Vec analysis suggests there is a strong similarity between these girls, but what elements of their gender are being highlighted? This is where we reach the limits of DH work. It is difficult to determine what this focus on the similarity of gender means without a more holistic understanding of the representation of the other female characters, and this is where close reading can be helpful. For example, Kustritz's (2015) analysis of Hermione/Snape fan fiction includes stories that greatly demean Hermione and/or condone rape as well as stories that empower Hermione and engage in feminist rhetoric. Our corpus shows that Hermione's name is most similar to other female names, but without close reading, we cannot determine whether the individual or overall representation of femininity is sexist, feminist, or (most likely) some combination thereof.

[4.9] However, what this study, and other such DH work, can help elucidate is which characteristics fans emphasize in their fiction. Our Word2Vec analyses on both corpora did not find semantic connections between Hermione and such traits as intelligence or morality that scholars and fans alike note. Instead, Word2Vec finds patterns at the most elementary levels of language, which people are often unaware of. While many scholars and fans have interpreted Hermione as a smart female activist, they might not have realized how much of Hermione's character is embedded in Harry in the books. Likewise, in crafting romantic narratives, fan fiction writers may not realize how much their female characters fulfill the same roles over and over again. Word2Vec does not necessarily reveal the most obvious connections but rather linguistic patterns that may well be unconscious or socially ingrained. Text mining large numbers of fan fiction stories can thus help scholars discover the underlying assumptions about characters that have been accepted by fandoms.

5. Social implications

[5.1] Our use of Word2Vec allows us to measure and nuance what we understand about specific characters. The analysis demonstrates that Hermione is used grammatically and contextually differently in the books (in which she is most like Harry and Ron) and in our fan fiction corpus (in which she is most like other girls/women). This article is not the first fan studies project to use DH (De Kosnik 2016; Duggan 2020); what we want to offer here is a theorization of how text mining, and specifically Word2Vec, allows us to study the way fan fiction writers reimagine characters. This kind of DH work can have real social implications, as we can study which identities are emphasized for different characters. For example, gender is more consistently important for Hermione than things like race or intelligence.

[5.2] This work can help us nuance conversations around how fans interact with authors. Barthes ends his groundbreaking essay "Death of the Author" by proclaiming that "the birth of the reader must be at the cost of the death of the Author" ([1967] 2002, 224). His essay, which essentially argues that meaning rests with the text and its readers, sets up an all-powerful author whom the reader must overthrow, a literary battle that has permeated fan spaces and fan studies, often leading fan scholars to claim that fans are resistant readers. For example, in one of the foundational works of fan studies, Textual Poachers: Television Fans and Participatory Culture, Henry Jenkins argues that "fandom…provides a space within which fans may articulate their specific concerns about sexuality, gender, racism, colonialism, militarism, and forced conformity" ([1992] 2013, 283). Although Jenkins later tempered this argument, Lesley Goodman argues that this early work in fan studies has meant that the "rule-breaking aspects of fandom have…often been at the center of academic fan studies" (2015, 662). Specifically, there is already a contentious scholarly and fandom-based conversation surrounding how fans interact with Rowling as author. While many venerate her as the goddess of the Harry Potter universe, fans have become more and more agitated with her as she continues to release information that seeks to control fans' narratives around certain characters and publishes racist and transphobic works and commentaries (note 4). Rowling's continued lack of representation and problematic views (e.g., her depiction of Indigenous peoples in her description of the American school Ilvermorny on her website Pottermore) have led many fans to denounce her, which makes the fandom seem both resistant and progressive.

[5.3] However, at the same time, many scholars have noted that Harry Potter fans replicate Rowling's problematic representations. When scholars, critics, commentators, and fans declare the death of the author, they often do so by celebrating the fans, but fans can be just as racist, sexist, homophobic, transphobic, classist, and/or fatphobic as original content creators. We saw this play out when Rowling released a patently transphobic essay in June 2020. While Rowling's comments are awful, author Rebecca Rowe was uncomfortable with how many people responded by arguing that fans had killed the author and were a universally resistant group. For example, a New York Times article declared that for fans resistant to Rowling's transphobia, "the discussion is on how to distance or separate themselves from the author who created a fantasy world that animates their lives on a daily basis" (Jacobs 2020). Yet at the same time, fans and scholars of color were fighting the racism evident on AO3. According to an open letter drafted by fans and scholars, in the middle of racial protests after the police murder of George Floyd in 2020, AO3 released an empty statement about the problems of racism without suggesting a course of action to correct many of the problems its own site perpetuated (Close et al. 2020). Scholars have been pointing to the racism rampant in fan spaces for years. Fans are not the perfect rebels people often tout them to be when they try to call for the death of the author. Many fans may emphasize queer representations, including trans characters, but that does not mean that fans are perfectly inclusionary, or that all fans are inclusionary or exclusionary in the same ways.

[5.4] Text mining allows scholars to look for larger patterns in how fans write their characters' identities so we can better understand fandoms as a whole, particularly when placed alongside close readings of individual fan fiction stories. Because DH can explore such large quantities of work, we can, we hope, understand fandoms and fan communities in more holistic ways, to understand the ways they rebel, replicate, and renew current oppressive systems. There are many significant questions in fan studies that could be approached through DH practices as we develop the tools, skills, and relationships to do so. We can then use DH methods and tools to bring this work to the fans to help them understand their own practices better—maybe even to help them realize the ways in which they are being exclusionary. Of course, there are plenty of things that DH cannot help us study (yet). We do not want to encourage DH to overtake other types of analysis; rather, we want DH to be recognized as an additional tool in the toolbox, facilitating new methodologies to help us better understand the millions of people worldwide that engage in fan practices.

6. Notes

1. For more on the use of text mining in literary studies, see Tahmasebi and Hengchen (2019).

2. For more on the rise of the Harry Potter fandom and who participates in that fandom, see Black (2008), Busse (2017), and Busse and Hellekson (2006). These scholars all note that fan fiction writers tend to be young and female, and Jennifer Duggan (2020) argues that DH can further develop our understanding of fan identities, including age, gender, sexuality, and race, as well as how that does or does not shift over time or by fandom.

3. For more on how the internet has impacted fan culture, see Black (2008), Busse and Hellekson (2006), and De Kosnik (2016).

4. For more about fan response to Rowling, see Bhattacharya (2015), Busse (2017), and Goodman (2015).

7. References

Altintaş, Ayşegül Kuglin. 2013. "A New Hermione: Re-creations of the Female Harry Potter Protagonist in Fan Fiction." Zeitschrift für Anglistik und Amerikanistik 61 (2): 155–73. https://doi.org/10.1515/zaa.2013.61.2.155.

Archive of Our Own. n.d. "Glossary." Archive of Our Own. Accessed May 5, 2020. https://archiveofourown.org/faq/glossary?language_id=en#hitdef.

Armstrong, Rachel. 2015. "Sexual Geometry of the Golden Trio: Hermione's Subversion of Traditional Female Subject Positions." In A Wizard of Their Age: Critical Essays from the Harry Potter Generation, edited by Cecilia Konchar Farr et al., 235–50. Albany: State University of New York Press.

Barthes, Roland. (1967) 2002. "Death of the Author." In The Book History Reader, edited by David Finkelstein and Alistair McCleery, 221–24. London: Routledge.

Barthes, Roland. 1974. S/Z: An Essay. Translated by Richard Miller. New York: Hill & Wang.

Bell, Christopher E., ed. 2012. Hermione Granger Saves the World: Essays on the Feminist Heroine of Hogwarts. Jefferson, NC: McFarland.

Berents, Helen. 2012. "Hermione Granger Goes to War: A Feminist Reflection on Girls in Conflict." In Bell 2012, 142–62.

Berndt, Katrin. 2011. "Hermione Granger, or A Vindication of the Rights of Girl." In Heroism in the Harry Potter Series, edited by Katrin Berndt and Lena Steveker, 159–76. London: Ashgate.

Bhattacharya, Saradindu. 2015. "J. K. Rowling: Author(ing) Celebrity." In Critical Insights: The Harry Potter Series, edited by Lana A. Whited and Katherine M. Grimes, 224–41. Ipswich, MA: Salem Press.

Black, Rebecca W. 2008. Adolescents and Online Fan Fiction. New York: Peter Lang.

Bode, Katherine. 2017. "The Equivalence of 'Close' and 'Distant' Reading; or, Toward a New Object for Data-Rich Literary History." Modern Language Quarterly 78 (1): 77–106. https://doi.org/10.1215/00267929-3699787.

Burdick, Anne, Johanna Drucker, Peter Lunefeld, Todd Presener, and Jeffrey Schnapp. 2012. Digital_Humanities. Cambridge, MA: MIT Press.

Busse, Kristina. 2017. Framing Fan Fiction: Literary and Social Practices in Fan Fiction Communities. Iowa City: University of Iowa Press.

Busse, Kristina, and Karen Hellekson. 2006. "Introduction: Work in Progress." In Fan Fiction and Fan Communities in the Age of the Internet: New Essays, edited by Karen Hellekson and Kristina Busse, 5–32. Jefferson, NC: McFarland.

Busse, Kristina, and Karen Hellekson. 2012. "Identity, Ethics, and Fan Privacy." In Fan Culture: Theory/Practice, edited by Katherine Larsen and Lynn Zubernis, 38–56. Newcastle upon Tyne: Cambridge Scholars.

Close, Samantha, et al. "Open Letter to the OTW on Racism in Fandom." Google Docs. https://docs.google.com/document/d/e/2PACX-1vSNDs1dZ_8zDZOwvR7hdH0o-N3OjUnY-AEAE4IV7fbyvcomkTFd3jkh1oBCrDGNSRV1BrX9WlHYkCjk/pub.

Cordova, Melanie J. 2015. "'Because I'm a Girl, I Suppose!' Gender Lines and Narrative Perspective in Harry Potter." Mythlore 33 (2): 21–35.

Cuarón, Alfonso, dir. 2004. Harry Potter and the Prisoner of Azkaban. Los Angeles: Warner Bros. Pictures. DVD.

De Kosnik, Abigail. 2016. Rogue Archives: Digital Cultural Memory and Media Fandom. Cambridge, MA: MIT Press.

Dresang, Eliza T. 2002. "Hermione Granger and the Heritage of Gender." In The Ivory Tower and Harry Potter: Perspectives on a Literary Phenomenon, edited by Lana A. Whited, 211–42. Columbia: University of Missouri Press.

Duggan, Jennifer. 2020. "Who Writes Harry Potter Fan Fiction? Passionate Detachment, 'Zooming Out,' and Fan Fiction Paratexts on AO3." Transformative Works and Cultures, no. 34. https://doi.org/10.3983/twc.2020.1863.

Foster, Tara. 2012 "'Books! And Cleverness!' Hermione's Wits." In Bell 2012, 105–24.

Gercama, Atje. 2012. "'I'm Hoping to Do Some Good in the World': Hermione Granger and Feminist Ethics." In Bell 2012, 34–51.

Goodman, Lesley. 2015. "Disappointing Fans: Fandom, Fictional Theory, and the Death of the Author." Journal of Popular Culture 48 (4): 662–76. https://doi.org/10.1111/jpcu.12223.

Grayson, Siobhán, Maria Mulvany, Karen Wade, Gerardine Meaney, and Derek Greene. 2016. "Novel2Vec: Characterising 19th Century Fiction via Word Embeddings." Ceur Workshop Proceedings 1751:68–79.

Hamilton, William L., Jure Leskovec, and Dan Jurafsky. 2016. "Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change." Cornell University arXiv. Last modified October 25, 2018. https://arxiv.org/abs/1605.09096.

Heimerl, Florian, and Michael Gleicher. 2018. "Interactive Analysis of Word Vector Embeddings." Eurographics Conference on Visualization 37 (3): 253–65. https://doi.org/10.1111/cgf.13417.

Jacobs, Julia. 2020. "Harry Potter Fans Reimagine Their World without Its Creator." New York Times, June 12, 2020. https://www.nytimes.com/2020/06/12/style/jk-rowling-transgender-fans.html.

Jannidis, Fotis. 2013. "Character." The Living Handbook of Narratology. Last modified September 14, 2013. https://www.lhn.uni-hamburg.de/node/41.html.

Jenkins, Henry. (1992) 2013. Textual Poachers: Television Fans and Participatory Culture. 20th anniversary ed. London: Routledge.

Kaplan, Deborah. 2006. "Construction of Fan Fiction Character through Narrative." In Fan Fiction and Fan Communities in the Age of the Internet: New Essays, edited by Karen Hellekson and Kristina Busse, 134–52. Jefferson, NC: McFarland.

Kelley, Brittany. 2016. "Toward a Goodwill Ethics of Online Research Methods." Transformative Works and Cultures, no. 22. http://dx.doi.org/10.3983/twc.2016.0891.

Klingbiel, Christine. 2012. "Hermione Granger: Insufferable Know-It-All or Superhero?" In Bell 2012, 163–79.

Kustritz, Anne. 2015. "Domesticating Hermione: The Emergence of Genre and Community from WIKTT's Feminist Romance Debates." Feminist Media Studies 15 (3): 444–59. https://doi.org/10.1080/14680777.2014.945605.

Margolin, Uri. 1995. "Characters in Literary Narrative: Representation and Signification." Semiotica 106 (3–4): 373–92.

Piper, Andrew. 2018. Enumerations: Data and Literary Study. Chicago: University of Chicago Press.

Rowling, J. K. 1999. The Prisoner of Azkaban. London: Bloomsbury.

Tahmasebi, Nina, and Simon Hengchen. 2019. "The Strengths and Pitfalls of Large-Scale Text Mining for Literary Studies." Samlaren 140:198–227.

Taylor, Ashley. 2014. "Hermione Granger: 21st Century Feminist Hero." In The Ravenclaw Chronicles: Reflections from Edinboro, edited by Corbin Fowler, 114–20. Newcastle upon Tyne: Cambridge Scholars.

Thompson, William V. 2012. "From Teenage Witch to Social Activist: Hermione Granger as Female Locus." In Bell 2012, 181–97.