Thoughts writing a Google App script

13 October 2020

Recently a friend of mine asked for a simple program that would correlate some data drawn from a couple of online services. I decided the best way to do this for him would be to use a google spreadsheet as the host, putting the code into the spreadsheet's script area. I'm no expert in Google App Script, but the exercise led to a few observations, which I feel compelled to share.

Google Sheets is a great platform for non-programmers

Had I been writing this script for my own purposes, I'd never consider using Google Sheets. I'd just write it is a command line script. But whenever you write a program, you have to think about how it's going to be deployed. In this case my friend, although quite the whizz with audio-visual tech, isn't a programmer who uses the command line every day. Also he uses windows, which isn't an environment I'm familiar with any more. (I'm told its command line has got better, but the bar was low enough for worms to jump over.) However, like many computer users, he uses Google Sheets all the time.

Spreadsheets get a lot of stick from us professional programmers, who often observe that many businesses rely on a few Excel spreadsheets. We roll our eyes as we say this, such spreadsheets are often a mess of complicated code, little modularity, untested, with no version control, sitting around on users computers without any management. Spreadsheets are often coerced into tasks that they are ill sorted for, often acting as cobbled together databases, tables whose relationships are a web of tangled VLOOKUPs.

But these spreadsheets are widely used for a good reason. The tabular metaphor is a simple yet effective way to handle structured data. By using Illustrative Programming, users can see results immediately as they work their formulae. Google Sheets makes it easy to share spreadsheets with other people, making it easy for several people to have a common place to store shared data. As someone who instinctively dislikes barriers based on skills, I've often felt frustrated that software developers don't get more involved in what other professionals are doing with their spreadsheets.

Like so many others, a spreadsheet makes a natural environment for my friend to carry out this task. Google sheets allows me to create a new menu and easily bind scripts to menu items, so he can easily run my scripts, see the data the scripts generate, and add additional data directly into a sheet. We can easily share a sheet so I can update the scripts or look at data if a problem comes up. He doesn't have to install any software on his machine, or keep it up to date.

Should I get run over by a bus, he can easily share the sheet with someone else, who can easily edit and run the sheet. Since the code is JavaScript, it's a widely known language for programmers to work with.

The hardest part is authentication

The "business logic" in the scripts was very easy. Compare the list of members on his Slack channel to members either registered on Patreon or on his own separate list. Then use the comparison to generate lists of people to add or remove from Slack. All it needed was to pull emails from the lists and do a couple of set difference operations.

The hard part was getting the lists, not in the sense of fetching them from a REST URL (which again is easy), but in satisfying the service that the script is authorized to get this data. Both services use OAuth to sort out authentication, but that standard is no indication that its a plug and play exercise, even with the library that Google App scripts provides.

In the end I ended up half-avoiding and entirely-avoiding authentication. Slack has a nice mechanism by which you can create an app to access the Slack data, give it the authorizations you need, and it gives you a simple access token on its website. For this application I could just put that access token into the script. Usually that's poor security practice, but in this case the script is in the same spreadsheet as the data it downloads (and that data isn't dreadfully sensitive). That side-stepped most of the complexity that OAuth presented.

The Patreon data was both more awkward to authenticate and more sensitive data. So here I did an end-run around the authentication. The Patreon web-app allows the user to download data into a CSV file. So I ask my friend to do that and import the data into the spreadsheet.

There is a real opportunity here for Google to simplify the whole authentication flow. I should be able to just call a fetch method on a remote service, and have the infrastructure sort out the authentication flows without me having to investigate and program them myself.

Google's documentation is non-zero

That's about the best thing I can say about it. There is a list of all the classes and their methods. By reading through them I could usually figure something out. But there's not much other than that, and even a simple task like this led me into more awkward spots than I would like.

Organizing the spreadsheet

I don't program with spreadsheets much (I do general data munging and graph plotting in R) so I'm not experienced enough to have strong opinions on what a well structured spreadsheet looks like. There may be advice somewhere on how to design your spreadsheets well, but I couldn't find it (if anyone knows any good articles, do let me know).

Given that lack, I went with my usual instincts. First instinct is to download data to a local store with minimal manipulation. So my script to download data from slack did no more than select the fields I wanted and dumped them into one page of the spreadsheet. Similarly my page of Patreon data assumes a simple upload of the CSV file from Patreon. Both of these pages are built to clear and replace the whole page when they are refreshed. A third page just contained the hand-maintained list of exceptions. All three of these sheets are pure data sheets, a single table, headings in row 1, no formulae. The comparison script reads from these three data sheets, does the (simple) application logic and emits the two lists to a separate output sheet.

That's very much how I'd use separate text files in a command line app. It allows the user to see the raw downloaded data. I can run (and test) the application logic without downloading each time. I could set up a sheet with test data. There is a clear one-way flow of data between the sheets and code.

Don't use appendRow to append rows

The first time I ran the code to download from slack was depressing, as it ran horribly slowly. There may have only been a thousand or so rows, but they were added to the spreadsheet at the rate of about one a second. We could put up with that, but it wasn't very good. I was convinced there must be a quicker way to do this.

Hunting around the API, I saw that a lot of spreadsheet operations depend on defining a range in the spreadsheet. I was adding a new row using Sheet.appendRow, but if I defined a range (which could be whole sheet), I could use Range.setValues instead. Once I did, adding the rows was effectively instantaneous. I did not find any hints in the documentation or elsewhere on the web to try this, which is important as the lack of this kind of documentation is barrier to people using this platform more widely.

Allow API lookups to look up multiple values

As I mentioned above, I ended up not using the REST interface for the Patreon data due to the complications around authentication. But there was another reason to favor the CSV download. The Patreon API included a resource that would tell me all the people who were supporters of a campaign, for these people it would give me their Patreon ID and their name. But to cross-check with the slack list, I also needed their email. That I could look up by getting a resource indexed by ID. However I'd need to do that for several hundred people, and I'd need a separate GET for each one.

The message to API designers is this. If you provide the ability to lookup information about a resource by ID, support the ability to provide data for multiple IDs at once. [1]

I like separating application logic from spreadsheet IO

The way to access data from the spreadsheet is to use the column and row conventions of the spreadsheet (eg cell "B22" or range "A2:E412"). That makes sense for many scripting tasks, as the programmers are thinking of the problem in terms of manipulating cells in the spreadsheet.

I tend to think of things rather differently, preferring my data in the form of basic JavaScript data structures, particularly since I can then use JavaScript's collection pipeline operators on them.

Given that, this was a handy function I wrote to extract data from a sheet, and return it as an array of JavaScript objects.

extractData(sheetName, firstCol, lastCol, mapper) {
  const sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName(sheetName)
  const numRows = sheet.getLastRow()
  const range = `${firstCol}1:${lastCol}${numRows}`
  return sheet.getRange(range).getValues().map(mapper)
}

I could then use it with code like this:

const mapper = row => ({email: row[3], name: row[0], slackId: row[1]})
slackData = this.extractData("raw-slack-download", "A", "D", mapper)

Once I'd got it into arrays, figuring out the results was easy, although I had to write a simple Array.difference function as I didn't have access to lodash.

The development environment is understandably crude

To write the JavaScript, I just pick a menu item in the spreadsheet and type into a crude text editor. It's not the comfortable home that I'm accustomed to, but it's fine for an overnight stay.

If I were doing something more sophisticated, I'd investigate setting up a better environment. One possibility would be to see if I can edit the script using Emacs's awesome Tramp mode (which allows editing remote files as if they are local ones.) Better still would be a way to sync local files with the google drive, allowing me to keep the source code in a git repo. But for a simple task such as this, about 150 lines-of-code, it wasn't worth looking into seeing if this was possible.

Summing Up

Hosting a simple application on a Google spreadsheet is an appealing deployment platform for a range of simple tasks. It allows users to run code without having to install something on their machine, to enter data in a familiar environment, and support easy sharing with colleagues. It's not a platform I hear much discussion about, but it's one to keep in mind. In particular, for any task that would be a simple shell script, but your users aren't comfortable with console windows and text files.


Footnotes

1: It may be there's a way to do it, but I didn't find it before I decided to go the CSV route instead.

Significant Revisions

13 October 2020: Published

15 September 2020: Started drafting