Fan fiction metadata creation and utilization within fan fiction archives: Three primary models

Shannon Fay Johnson

Indiana University–Purdue University Fort Wayne, Fort Wayne, Indiana, United States

[0.1] Abstract—Issues related to searchability and ease of access have plagued fan fiction since its inception. This paper discusses the predominate forms of fan-mediated indexing and descriptive metadata, commonly referred to as folksonomy or tagging, and compares the benefits and disadvantages of each model. These models fall into three broad categories: free tagging, controlled vocabulary, and hybrid folksonomy. Each model has distinct advantages and shortcomings related to findability, results filtering, and creative empowerment. Examples for each are provided. Possible ramifications to fan fiction from improved metadata and access are also discussed.

[0.2] Keywords—Folksonomy; Searchability; Tagging

1. Introduction

[1.1] One common complaint within the fan fiction community is the difficulty in locating works, particularly those that contain specific story elements, characters, or pairings (Kem 2005). Issues related to searchability and ease of access have plagued fan fiction since its inception, and with the migration from print- to Web-based forms of dissemination, this problem has persisted (Versaphile 2011). This paper discusses the predominate forms of fan-mediated indexing and descriptive metadata, commonly referred to as folksonomy, and compares the benefits and disadvantages of each model as it applies to the fan fiction community.

[1.2] Fan fiction is not a new phenomenon. Despite its growing popularity and publicity, it is still dominated by a fan culture that craves independence and is resistant to outside interference, yet thrives on community (Kem 2005). This sense of community is central to the development of fan fiction and has been solidified by fandom's early adoption of the World Wide Web (Bury 2005; Gray, Sandvoss, and Harrington 2007; Hellekson and Busse 2006; Jenkins 2012). As fandom moved toward online interaction, the way information was shared and disseminated grew from the paper-based fanzine, story circuits, and convention models toward a more easily accessed and faster-paced virtual community of users that allowed for not only increased consumption, but also creation (Bury 2005; Duffett 2013).

[1.3] This shift placed average fans in a position to have greater control over how they received and shared content with their fellow fans, creating increased opportunities for community and personalization. This need for community, and fandom's wide expansion during the early days of the Internet, have been deciding factors in how authors and readers have developed genre-specific organizational classifications, or folksonomies, and how they have been applied to the descriptions and categorizations of fan fiction works. These folksonomies are continually changing and adapting as new fans enter the community, both virtually and physically, and as technology progresses.

[1.4] A folksonomy is an assembly of user-generated metadata created collaboratively, relating to a specific application or group (Eynard, Mazzola, and Dattolo 2013)—what Weinberger calls a "grassroots taxonomy" (2005). Folksonomies can take many forms, but in today's computer environment the main mode of creation is tagging, and may be referred to as folk classification, ethnoclassification, social classification, or free tagging (Hammond et al. 2005). This modern setting, within which increasingly more fan community interaction abides, is uniquely placed to allow fan content creators and consumers to adapt the environment, including the metadata used, to meet their specific and, in some cases, idiosyncratic needs (Watson 2010). The development of the Web 2.0 technology that allows users to generate these tags, and interact with one another via various forms of commentary, has opened new realms of possibility for fan fiction organization that early static pages and electronic mailing lists were not capable of providing.

[1.5] Tags are "typically short text strings freely chosen by users; they are democratic and bottom-up, flat (as opposed to hierarchical), inclusive…and extremely easy to use" (Eynard, Mazzola, and Dattolo 2013, 1437). These tags have many applications, but can create their own challenges for the end-user. Tags can fall prey to problems related to a variety of user errors, such as misspellings or typos, as well as problems of homonyms, synonyms, and other syntactic variations (Eynard, Mazzola, and Dattolo 2013). These ambiguities can make it difficult for purely free-formed folksonomies to deliver usable and dynamic searching for the average fan fiction reader. This abundance of potentially inaccurate or misleading tags has been described as "Meta Noise," and can mask potentially relevant works beneath a mountain of irrelevant returns (Peterson 2006).

[1.6] While considerable strides are being made in the improvement of folksonomy and user-generated tagging systems, these problems have not yet been fully addressed (Eynard, Mazzola, and Dattolo 2013). More structured systems of cataloging or indexing of information, such as MARC (Machine Readable Cataloging, the standard method used by libraries since the 1960s) and Dublin Core (a set of terms standardized for Web resources and developed by libraries in the mid-1990s) are problematic when applied to the more rapid and evolving genre of fan publications (Bartel 2004). The level of specificity and the inherent desire within the community for individuality and author control, along with the large volume of work being created and disseminated, make it nearly impossible for any outside authority to be imposed on the metadata itself (Kem 2005). Recommendation and bookmarking sites can offer some degree of community oversight, but this is informal and not widely adopted in all fandoms.

[1.7] The majority of scholarly articles written about fan fiction have focused on the content involved, the legality of the genre, or the reasons for authorship and consumption (Thomas 2011). Little attention has been paid to how these works are organized and made locatable, nor the larger implications to fandom these organizational structures may pose. Some authors have tried to draw parallels among fan fiction creation, digital literacy skills, and the needs of 21st-century learners (Black 2009, Alvermann, Hutchins, and McDevitt 2012), while others have argued for the use of fan fiction for classroom writing (Jwa 2012, Roozen 2009, Chandler-Olcott and Mahar 2003). Yet the focus has remained on the act of creation rather than the forms in which the work is shared and an audience found. Focusing on only the initial creative act undermines the potential of fan fiction to serve as both a classroom model and a source of academic study in a much broader context.

[1.8] Discussions of fan fiction folksonomies have potential as a source for larger lessons on digital literacy and information literacy, particularly as a model for the use of tagging and Web 2.0 technologies. Yet the predominant discussions within library and education literature are mired in copyright and author creativity rather than the more technical aspects of fan fiction dissemination and creation. Matters of copyright and legality are important elements of any information literacy instruction, but while attention is focused on these issues, an opportunity to discuss the broader skills related to Web navigation and information dissemination is missed. As Watson states, "understanding how to productively filter and interpret data is quickly becoming essential for anyone hoping to survive in the rapidly changing media industry" (Watson 2010). Fan fiction organization and tagging can serve as a catalyst for such discussions. This reluctance to consider fan fiction and fan communities as worthy of such instructional use is symptomatic of larger problems of legitimacy for the genre.

[1.9] Fan fiction folksonomies are not only relevant from an educational perspective, but they also play a role in the long-term preservation of the works themselves. Lothian discusses the need for archives to preserve and protect fan fiction and fan conversation, describing the creation of the Archive of Our Own (AO3) ( as one possible method to aid in the retention of fan works (2011). Others such as Jessica Kem and Versaphile recount the history and changing nature of fan fiction archives and Web sites. Kem makes a strong case for the inclusion of librarians within fan efforts at preservation, and Versaphile argues for increased author responsibility and proactive planning for retention of works (Kem 2005; Versaphile 2011). While Kem touches on the possibility of librarian involvement in a theoretical catalog or guide to fan fiction, she does not focus on the existing fan efforts to accomplish that goal. In fact, her research indicates that a significant portion of the fan community would be resistant to outside attempts by any authority, such as librarians, to impose structure (Kem 2005). With modern Web 2.0 technology, fan archivists and authors now have tools at their disposal that had yet to be developed in 2005. Is outside interference even necessary? Furthermore, what of librarians and Web developers who are themselves fans and therefore members of the community? Can their expertise be leveraged to maximize current indexing and tagging technology to make fan works more discoverable?

[1.10] To answer that question, this article explores current fan fiction descriptive metadata models in use by various Web sites and places them into three broad categories: free tagging; controlled vocabulary; and hybrid folksonomy. In order to better understand how these models have been derived, it is necessary to look to the development of fan fiction, the changing dissemination methods that have been used within the community, and how metadata has been utilized by various sites.

2. History and development of fan fiction metadata usage

[2.1] In the early days of print zines, information about a particular work was, by necessity, included in the work itself. Short forms of relaying information developed to allow authors to more quickly inform readers about the nature of the content. This early fan vocabulary has become the basis of a majority of current fan fiction folksonomy. Terms such as slash (stories with a homosexual romantic pairing—explicit or otherwise) and PWP (plot, what plot? or porn without plot) began in this era and have carried over into modern electronic archives and metadata usage (Bury 2005). In fact, a majority of fan practices related to indexing and organization are derived from earlier print-based conventions. This can be seen in the formatting of author notes, titles, and disclaimers, as well as basic fan fiction vocabulary. This print holdover is not isolated to fandom but can also be seen in other indexing or cataloging usage, such as MARC and the Anglo-American Cataloguing Rules (AACR2).

[2.2] During the first wave of transition from print to electronic distribution, early attempts at metadata were often limited by the software used. Discussion groups on platforms such as Usenet and CompuServe were often located based on fandom or actor (Bury 2005), but stories themselves were typically posted with little to no extra searchable data. If the list was e-mail- or message-based and without archival features, stories were sent out without any provision for long-term retention. Most Usenet groups, for example, retained messages for only a few weeks, and relied primarily on message headers to convey the subject. An early example of an online folksonomy can be seen in these headers, as users developed codes and conventions to convey this information succinctly and consistently (Baym 2000).

[2.3] With the advent and proliferation of keyword searching, it became easier for readers to navigate these services, but attempts to apply standardization or consistency were continually hampered by changing platforms and frequent loss of content as Web sites folded or merged with one another (Hellekson and Busse 2006; Kem 2005; Versaphile 2011). For most of today's younger fan fiction readers and authors, the journey began not with print zines, story circuits, or Usenet, but with either a fan fiction archive or one of the smaller mailing lists or group pages. Many of these communities have disappeared as sites have closed or lost popularity, such as GeoCities and MySpace, but examples can still be found within Yahoo Groups ( and many of LiveJournal's ( community pages.

[2.4] The advantage to the mailing list or group page was ease of use and the ability to quickly connect with other fans who shared particular interests. These sites benefited from word of mouth among fans, but were not solely dependent on it for growth, as story circuits and print zines often were. These sites were typically not created for fan fiction specifically, and in some cases elaborate workarounds were needed to meet the needs of the fan community. The tendency for fans to become fragmented into small groups based on specific romantic pairings within a fandom was an inherent problem, and metadata usage and application varied greatly between platforms and communities (Hellekson and Busse 2006; Kem 2005). While these are often considered older forms of fan fiction dissemination, they remain popular to this day. Examples of active groups are the Usenet group alt.startrek.creative, Holmesslash on Yahoo, the Artie/Claudia page on LiveJournal (, and the Spiced Peaches e-zine (

[2.5] Many of these mailing lists now serve primarily as dissemination formats rather than fiction archives, but there are exceptions. For the purposes of this paper, LiveJournal is being included as an archive, even though its primary function is not archival, while the more e-mail-based mailing lists such as Yahoo Groups have been excluded. This distinction is somewhat subjective, but LiveJournal's recent changes have created new metadata opportunities worth discussing, whereas the more e-mail-based distribution methods are not as easily configured for later retrieval. More recent dissemination methods such as Twitter and Tumblr have not been included because of the relatively short length of time they have been in use, and the large number of papers already available that discuss hashtags and image-based tagging, respectively.

[2.6] It is with the creation of the larger multigenre archives, such as and AO3, that fan fiction folksonomies and metadata usage take on more recognizable forms, even while they lose the specificity available to smaller fandom- or pairing-specific repositories. These sites range greatly in the level of author and reader control in descriptive metadata creation, as well as searchability. Fan communities on LiveJournal operate with a range of moderated and unmoderated, or free tagging, while is dependent on controlled vocabulary. Controlled vocabulary, while author-selected, does not allow individuals to create new tags, instead relying on the predetermined vocabulary provided by the site designers and allowing authors to pick from only this limited selection. In multigenre archives, controlled vocabulary is rarely flexible enough to allow for fandom-specific folksonomies. Let us begin by looking at the free-tagging model in more depth before moving on to the controlled vocabulary and hybrid models.

3. Models of metadata creation and usage

[3.1] The simplest form of descriptive metadata in fan fiction is that of free tagging. Examples can be seen on any of the numerous LiveJournal communities or author pages dedicated to fan fiction. LiveJournal began in 1998; it was not originally intended to house fan fiction, but was quickly adopted by many authors because of its ease of use and easy personalization (Versaphile 2011). One of the major drawbacks to the site is the difficulty in searching. Since descriptive metadata within LiveJournal is dependent on tagging, it is up to the author and readers to correctly interpret what others would see as the relevant terms that apply to the work. What the author might see as the main focus and intention may not match what the reader derives from it (Weinberger 2005). As with all tagging, there is also the risk of syntactic variation and uncertainty. A further complication is the age of the site and the diversity of user skills and experience with tagging. Many fan fiction LiveJournal communities predate the site's adoption of free tagging, and may have inconsistent use of tags over time.

[3.2] One approach that LiveJournal users have taken to combat this problem is to post guidelines for the formatting and structure of fan fiction entries as well as tags. These generally rely on the author or poster for compliance, although some communities may employ a moderator to ensure adherence. The typical entry consists of the following, but may vary significantly: Title, Author's Note, Rating, Warnings, and Summary. This level of descriptive data is taken directly from earlier print zine formatting, and conveys the very minimal required to attract a reader. Often romantic pairings and character lists are included as well (figure 1). Some communities have created tagging guidelines and retroactively tagged posts for consistency. The Artie and Claudia tags page from Warehouse 13 is a prime example ( Retroactive tagging can assist with the conversion of older entries to the new tagging format, but requires a significant amount of volunteer effort, as does moderation.

Screen capture reading 'Title: Distance Makes the Heart, chapter 17; Author: Piscaria; Pairing: adult!Charlie/Wonka; Story Rating: NC-17; Chapter Rating: R; Summary: Eight years after Charlie finds the Golden Ticket, Wonka sends him away from the factory. As Charlie deals with his hurt and confusion over leaving his home, he begins to reconsider his relationship to the factory...and to Willy Wonka.; Author's Note: I really recommend reading chapter sixteen again before moving on to this chapter, considering that it's been a good two years between updates. What

Figure 1. Screen capture of LiveJournal fan fiction post by Piscaria. [View larger image.]

[3.3] The major downside to sites like LiveJournal is the lack of advanced search capabilities. The primary method of searching is by author, with keyword searching being hampered by the private status of most journals and the unreliability of free tagging. Only publicly accessible journals are crawled by Web search engines, further complicating findability (Kem 2005; Versaphile 2011).

[3.4] According to van Dijck's 2009 study, 85 percent of individuals on user-generated content sites are either "passive spectators" or "inactives," with only 13 percent creating content or tags. With so few people contributing, and the majority only viewing works, the benefits to free tagging are greatly reduced. This tension between ease of posting and difficulty in discoverability causes many users to find this model frustrating (Kem 2005; Versaphile 2011). With most users not contributing at all to tagging, or doing so poorly or with little understanding of the science behind keyword searching, sites that are dependent solely on the fans for tagging prove difficult and time-consuming to navigate.

[3.5] The other extreme is found on sites like, founded in 1998, a year before LiveJournal. was always intended to act as a fan fiction archive, and was designed with a clear hierarchical structure and a controlled vocabulary for all metadata. The benefit to this model is the structured browsing and searching capability offered, and the advanced filtering options available. The disadvantage to the model is the difficulty in locating stories that fall outside the broad vocabulary given. For example, if an author is working with a pairing or fandom that does not have a large following, there may not be controlled vocabulary available. This causes such works to be locatable only by keyword searching in the text, summary, and title, which can render them unlocatable by the average user—especially if the pairing or fandom contains names that are not unique to it, such as Eliza and Henry from My Fair Lady (figure 2). While this is also a problem on LiveJournal,'s extensive listings on other more popular fandoms makes the omission of controlled vocabulary on smaller genres more glaring. To add to the complication, the site restricts the number of controlled terms that can be applied to a story—allowing only four characters to be listed and up to two romantic pairings. At this time, there is no option for descriptive tagging or for readers to tag stories after the author posts.

alt txt forthcoming

Figure 2. Author submission form on [View larger image.]

[3.6] Filtering options on were expanded in 2012 to allow for sorting based on the number of "favorites" and "reviews" a work has received (figure 3). These options allow readers to find popular works faster. With the rapidly expanding number of users and works at sites like, quality is a frequent concern (Kem 2005). While popularity is not synonymous with quality, these search features allow users to quickly locate works that have attracted significant previous attention. This functionality has existed on other archival sites for some time, but is not universally available. A shortcoming to sorting via reviews or comments is relatable to the controversial nature of citation analysis in academia—a work may garner extensive conversation due, in some cases, not to its quality but rather to a distinct lack of it. Authors with long history in a fandom may also attract their own fans, who may potentially comment or favor a work based on the popularity of the writer rather than the quality of the work itself.

alt txt forthcoming

Figure 3. filter options. [View larger image.]

[3.7] Recommendation and bookmarking sites such as Delicious ( and Pinboard ( offer other options for fans to manage and discover quality fan fiction works. These sites are also dependent on various forms of tagging, but offer fans the ability to mark works from multiple sites and repositories and manage personal collections with more autonomy. Many fan communities and mailing lists have active recommendation pages, sites, or blogs devoted to making quality works more discoverable. These sites are hampered by the same vocabulary issues as LiveJournal, and are also dependent on users to tag appropriately. Long-term stability of these lists is also a concern. Managing and updating links as Web addresses change and works are removed or migrated to new repositories is time consuming and often overlooked. Several of the larger fan fiction archives have created fan community page options to assist in the discovery of desired fan fiction works. See, for example,'s Communities site (

[3.8] Examples do exist of moderated archives that attempt to provide a basic level of quality, and employ complex controlled vocabulary. A prime example is the Doctor Who archive, A Teaspoon and An Open Mind ( Archive moderators validate stories and remove works that do not meet a minimum standard. The Web site also employs robust metadata specific to the fandom and advanced search options (figure 4). Fandom-specific archives such as Teaspoon allow for creation of more specific folksonomies than would be practical with larger multigenre archives, and still benefit from the advanced search features that controlled vocabulary makes possible.

alt txt forthcoming

Figure 4. A Teaspoon and An Open Mind advanced search screen. [View larger image.]

[3.9] Another site with fandom-specific controlled vocabulary is Ink Stained Fingers, a Harry Potter slash archive ( This site has an extensive list of filters based on pairing and sexual situation, many unique to slash fiction (figure 5). These filters allow the user to select desired story elements and filter out unwanted situations. Metadata of this detail is somewhat unusual in fandom, and could have negative ramifications in regard to censorship or author persecution when used with stories involving marginalized or subcultural themes. The level of specificity available gives the user the ability to define an incredibly specific search, however, and is an excellent example of how controlled vocabulary can be adapted to a very specific folksonomy.

alt txt forthcoming

Figure 5. Ink Stained Fingers warning filters. [View larger image.]

[3.10] The third metadata model is a blend of the free tagging and controlled vocabulary methods into a moderated form of tagging. The best example of this is AO3's hybrid folksonomy and tag wrangling. This mode of operation allows authors to create tags using any terminology they consider applicable; tag wranglers work in the background to link synonyms and alternative wordings, as described in the AO3 FAQ on tags ( This behind-the-scenes work allows for a form of classification and standardization not found with free tagging, but gives authors more control and creative license than a purely controlled vocabulary structure. While this model seems to provide the best of both competing formats, it does require an extensive, dedicated, and knowledgeable volunteer base to accomplish. The inclusion of Web developers and librarians in the creation of AO3 is evident in the construction and policies surrounding tag wrangling on the site. For smaller and more specific archives that already struggle to maintain moderators, it may be difficult or impossible to locate the necessary skilled volunteers (figure 6).

alt txt forthcoming

Figure 6. AO3 story submission tag form. [View larger image.]

[3.11] AO3 has filtering capabilities similar to, and allows for sorting based on kudos and comments, comparable to's favorites and reviews, respectively. Because of the variation in tags, AO3's filtering options are not as reliable as those derived from controlled vocabulary systems, and are subject to change as tags are wrangled in the background and new terms enter the folksonomy. For most users, this is "good enough." As Weinberger says, "The tagging movement says, in effect, that we're not going to wait for the experts to deliver a taxonomy from on high. We're just going to build one ourselves. It'll be messy and inelegant and inefficient, but it will be Good Enough. And, most important, it will be ours, reflect our needs and our ways of thinking" (Weinberger 2005, 4). This hybrid form of metadata application offers a fair mix of the better features of the other models without completely compromising search and filtering capabilities.

[3.12] The inclusion of computer and information professionals in platform development for fan sites such as AO3 is transforming how works are organized, maintained, and searched. Efforts to preserve works, while maintaining author control and freedom, are changing how metadata is applied and conceptualized. Approaches such as's and LiveJournal's restrict both author and reader, although in differing ways. has sacrificed authors' ability to describe their works as they wish, in favor of making the reader's experience more streamlined. LiveJournal has complicated the search for works in favor of allowing the creator complete freedom. The hybrid application provides a vehicle for offering both groups some standardization without compromising creativity or genre-specific folksonomy. Sites such as AO3 have been developed by fans, with modern Web principles at heart and an increasingly skilled set of volunteers to manage content. Sites such as LiveJournal, which were developed for entirely nonfandom-related purposes and launched prior to Web 2.0, do not have this advantage.

4. Criticisms of metadata usage

[4.1] Even though Henry Jenkins's groundbreaking publication Textual Poachers is now past its 20th anniversary, fan fiction has managed to remain mostly underground. Little attention has been given from traditional academic or publishing audiences, and for most scholars outside fandom, fan fiction is new to them. As increased scrutiny is drawn to the genre, with the popularity of works like Fifty Shades of Grey (2011), monetization efforts like Amazon Worlds, and publishers like Big Bang Press recruiting authors from fan fiction ranks, works that previously survived based on their obscurity and existence on the very fringes of fandom—catering to a small and specific audience—may be dragged into the light. Many of these works could be considered deviant, or touch on social or political themes that could cause the authors negative social, economic, or legal consequences if made available to a wider audience.

[4.2] Given these concerns, some fans have indicated a reluctance to increase the findability and searchability of such fan fiction. Reasons given include a distaste for the mainstreaming of the genre, as well as concern over a possible increase in censorship, and overall potential for authors of certain marginalized themes and fandoms to be harassed or face legal charges. This is particularly of concern with stories rated NC-17 or above, especially if they involve topics such as nonconsensual sex, incest, slash, or underage characters in sexual situations. Some authors wish to maintain their anonymity, and feel that if their work were subject to greater exposure they would risk damage to their professional or personal reputation if outed as fan fiction authors (Kem 2005). Some works may be considered subversive to governments that restrict Internet activity, and could open their authors and readers to undesired scrutiny should they be easily located by individuals outside the fan community.

[4.3] In 2002, was able to use its controlled metadata to remove stories from its archive that authors had marked as NC-17. The change in policy, from allowing stories of any rating to imposing a top limit of M (Mature), is still the subject of considerable fan discourse. Many sites such as maintain policies that restrict the type of publication they allow, arguing that it is to protect younger readers from explicit material or to comply with requests from original creators. Others maintain that these policies are tantamount to censorship. Online petitions like this one at ( and groups such as the Stop FanFiction Censorship on Facebook ( have formed to protest these policies.

5. Conclusion

[5.1] These three models of descriptive metadata usage—free tagging, controlled vocabulary, and hybrid folksonomy—represent a diversity of thought within the fan fiction community with regard to information organization and searchability, and mirror larger arguments taking place within the computer and library science fields. The question of controlled vocabulary versus free tagging is debated extensively within the library and information fields, with the tension between giving users what they want and what they need to make items findable taking on increased significance. As Kem states, "the fan fiction community is conflicted between a need for better accessibility and a need for community sovereignty" (2005, 1).

[5.2] Despite such concerns, tagging and metadata technologies are improving. Search and filter capabilities of sites like and AO3 are making fan fiction more easily located, even as the amount of it increases exponentially. The various methods these sites employ do have significant differences. However, the main goals remain the same. Fan readers want to be able to locate stories with specific criteria quickly and efficiently. They also desire stories to have some measure of popularity, and to know in advance if other readers have enjoyed the work or the author. These needs often run in direct conflict with the desire for fans to remain independent and not subject to authoritative control over their works, and may open up certain subgenres to public and possibly legal condemnation. How then can these two needs be moderated?

[5.3] As fewer print and e-mail zines are produced, and more small subject or relationship-specific Web repositories lose popularity, there will be increased pressure for outlying authors to shift to the concentrated reader base of multifandom archival platforms. The existence of these sites emerges from the popularity and normalization of the genre, and their use inevitably leads to greater conformity to the metadata schemes their creators choose to enact. Thus, further research on the preferences of different user groups and the benefits of refining their practices may help current and future archives to create improved retrieval methods for the next generation of fan fiction creators and readers—and, by extension, the next generation of readers and creators in our cultures.

