"Reflection Spring 2025 Publishing Research Fellowship" with a stack of books next to the text

Reflection: Publishing Research Fellowship (Spring 2025)

Author: Josh Lambert

With a grant from the TSSL project, in spring ’25, I hired a student, Melissa Rizzo ’28, to work on an exciting, unusual project in the field of publishing history. She worked alongside a second student, Sylvia Nica ’25 (who was paid through my personal research funds). 

The project began when I was contacted by a former employee of the publishing house Houghton Mifflin, who had read some of my writing about U.S. publishing history. He lived in Boston and we met up and talked, and he showed me some of the materials he had kept from his time working at HM in the 1980s and 1990s. The most fascinating of the documents he showed me were computer print-outs of the annual reports of sales for every book that HM published. A single one of these print-outs, preserved in a binder, was about five hundred pages long, and he told me had annual reports, like these, from many of the years he worked at the company. 

What’s important to note, here, is that reliable data about U.S. book sales is almost completely impossible to find. As many scholars have noted, publishing companies—even publicly traded ones—usually have no incentive to share precise sales data with the public, and the available methods for estimating the sales of books are wildly imprecise and approximate. You should be skeptical of any figures you hear about the sales of a particular book, as these are often fabricated or falsified. 

The former HM employee I met was willing to lend me his binders, but the question became, at that point, how I could most meaningfully study this data. It was fun to flip through a binder and find the sales data for an author I’ve taught or written about, but I wanted to be able to do more comprehensive analyses of patterns in the HM sales data. I figured I would first scan the binders, so that I would have digital images of the spreadsheet pages, and then I used an online tool, Textract, to reconstruct the spreadsheets on the basis of these images. Like any OCR tool, this one wasn’t perfect, and Melissa and Sylvia have been working to clean up the data so that it can be searched and used for data analysis. It’s been challenging work for them—a lot of precision is required, and they have been discovering many complex functions of Excel and Google Sheets to help them locate and fix errors, and to format the data. Work is continuing over the summer, and I’m hopeful that at some point in the next academic year I’ll be able to work with some students, and maybe some colleagues, who are knowledgeable data analysts, to see what kinds of patterns and insights we can find once we can comb through all the data.