Graduating From OpenScholar

Graduating From OpenScholar

Nathan Plowman
Thought byNathan Plowman
July 22, 2022
Group of students at university

What Is OpenScholar?

OpenScholar is a custom distribution of Drupal 7 that was first released in 2010. The vision of OpenScholar was to provide a platform that would allow individual faculty members, students, labs, and departments the ability to create and manage their own website without needing a developer to assist them. New sites can be provisioned from a selection of prebuilt presets and themes. Site owners can create publications, news, and event pages as well as most other types of content necessary for a personal or departmental higher education site. It also features a drag-and-drop layout system. 

As a distribution, OpenScholar goes far beyond adding a small piece of functionality to Drupal 7 in the way a module or plugin would. It is essentially a completely separate fork of Drupal that comes bundled with dozens of community and custom modules tailored specifically to the use case of quickly provisioning sites for academic institutions. OpenScholar uses a “virtual site” approach which allows all of the sites to live on a single Drupal installation with just one hosting environment. It’s common for installations of OpenScholar to have hundreds or even thousands of websites. 

Empty seats in a lecture hall

The Current State of OpenScholar

The OpenScholar distribution is only minimally maintained, and at twelve years old, is starting to show its age both in terms of technical debt and outdated user experience.

OpenScholar was built on Drupal 7, which will reach end-of-life on November 1, 2023. After this, it will no longer receive updates, bug fixes, or security patches. As of today (7/22/22), the OpenScholar project has not received an update since August 2021 - nearly a year ago.  Drupal 7 is only compatible with PHP 7, which will stop receiving security updates in Nov 2022. OpenScholar was never ported to Drupal 8 or 9. 

How This Impacts You

These issues impact your ability to continue using OpenScholar in the future, and you may want to start thinking about retiring your OpenScholar sites or migrating to a new platform if you have not done so already. 

Options for Sunsetting or Staying On OpenScholar

There are several paths you can take to move away from OpenScholar.

Retire or Archive the Websites

If the websites on your OpenScholar distribution are no longer being actively updated and receive minimal traffic, you could consider retiring them entirely. Of course, this decision should not be taken lightly, as all content would be inaccessible to the public and could negatively impact a large number of site owners who may still rely on their OpenScholar site to share their work.

A more tolerable alternative to full retirement would be to create a static archive of the sites. This would allow the content to still be accessible on the internet, but site owners would lose the ability to edit existing content or add new content to their sites, as they would no longer be connected to the CMS. 

Creating a static archive version of the site involves using a web scraper to download all of the HTML, CSS, Javascript, and Image files into a directory (similar to how the Wayback Machine works) that can then be hosted on a low-cost static platform such as Netlify. There are tools such as Drupal Tome that can help automate most of this process. 

The benefit of this approach is that you can still keep all of the content available on the internet without having to worry about the costs and risks of maintaining an outdated Drupal installation. There are some limitations though. Features that require backend server functionality, such as webforms and search pages will not work in a static archive. 

Stay on OpenScholar

You could, of course, just stay on OpenScholar. The two main concerns with doing this would be security risk and cost of maintenance. Because Drupal 7 will be reaching end-of-life soon, security updates will no longer be released for your OpenScholar site. So you will either need to accept these security risks or take measures to check for vulnerabilities and apply patches yourself. 

End of life for PHP 7 could also have significant consequences. While Drupal 7 has recently applied compatibility updates for PHP 8, OpenScholar may or may not do the same, given its lack of active support. Most managed hosting providers will not allow you to continue using unsupported versions of PHP due to security reasons. If OpenScholar does not release a new update with compatibility fixes before hosting providers start dropping support, you will need to either patch the OpenScholar codebase yourself to ensure compatibility with PHP 8 or find an alternative hosting solution that will let you continue to run deprecated PHP versions. 

Because of these concerns, staying on OpenScholar beyond Drupal 7 end-of-life should be considered a duct tape and glue fix. It may be able to buy you extra time to get a plan and budget in order for a migration, but it’s likely not a great long-term solution.

Migrate to Drupal 9

While there is not a Drupal 9 version of OpenScholar available, it is still possible to migrate the content from an OpenScholar distribution to Drupal 9. At a high level, this will require:

  1. Deciding on an approach for reproducing the multisite architecture.
  2. Developing a Drupal installation profile that reproduces the content types and features from OpenScholar that you need to preserve. 
  3. Writing migration scripts for migrating the legacy content to the new platform.

This will be a substantial undertaking that could take a small team several months of full-time work to complete. But if you want to continue to support the ability to provision new sites for individuals and groups at your institution, this will likely be the best option for you. 

Migrate to another CMS

While we’ll be focusing on the Drupal 9 upgrade path for the rest of this article, we felt it was worth mentioning that other options exist.

There is a new wave of Content Management Systems that are built with an API-first or “headless” approach, and would be paired with a modern Javascript framework like Next.js or Gatsby for constructing the frontend. Many of these allow you to create multiple “spaces” or independent collections of content under a single subscription. The concept of spaces provides an elegant replacement for the complex “virtual site” architecture of OpenScholar. Most of these “headless” CMSs are paid SaaS solutions (e.g. Contentful, Storyblok, Contentstack) but there are a couple open source ones as well, such as Strapi. 

However, there are a few considerations to think about before choosing a SaaS CMS. The pricing on many of the SaaS CMS products is typically based on factors such as the number of admin users or number of content items. Given that many OpenScholar installations have hundreds or thousands of sites and site owners, the pricing may not be favorable for this use case. Open source options like Strapi avoid these pricing issues, but come with the trade-off of having to manage your own hosting.

You would also likely need to build out your own solution and tooling for managing some of the other multi-site concerns, such as automating site provisioning or syncing shared configuration changes to all sites. 

Multi-site Approach

You will likely need to come up with a solution to replace the “virtual site” architecture that OpenScholar once provided. Deciding this should be near the top of your list as you work on your plan for replacing OpenScholar, as it will have a large impact on other technical decisions. Some things to consider when exploring options include:

  1. Will you need a platform that supports the automatic site provisioning capabilities that OpenScholar once provided?
     
  2. How will you manage shared code, configurations, and features across all sites for maintainability? (e.g. Having to manually add a new content type to hundreds of sites individually is not sustainable)
     
  3. What level of per-site customization will you need to support? Is it ok for all of the sites to share the same starter template, or will you need a greater degree of flexibility?

Here are a few multi-site solutions that are commonly used in the Drupal world:

Acquia Site Factory: Acquia Site Factory provides you with the hosting infrastructure and tooling required for provisioning new Drupal sites with the click of a button. You will need to follow a fairly prescriptive approach to building your platform in order for it to work with Site Factory, but this does save you from many complexities of trying to build this kind of tooling yourself.  With Acquia Site Factory, you would have a single Drupal codebase for all sites. Within that codebase, you can prepare one or more “installation profiles” that contain the configurations for provisioning a new site, such as default content types, enabled modules, etc. Acquia Site Factory also supports a separate theme repository, so you can deploy theme changes separately from other platform-level updates. It provides an admin interface for provisioning new sites with a few clicks. All sites share the same codebase, but limited customizations can be made by enabling/disabling custom modules on a per-site basis.  

Pantheon Upstreams: Upstreams is Pantheon’s answer to Acquia Site Factory. It also allows  Drupal sites to be provisioned from a custom starting template. The major differentiator from Site Factory is that it allows for more flexibility with customizing individual sites through the concept of “custom upstream” and “downstream” code repo. The “custom upstream” repo is where you maintain the shared code which will be used across all sites. Individual sites can also have their own site-level codebase with customizations, while still receiving updates from the “upstream” repo.

Traditional Drupal Multisite: Under this approach, all sites share a single codebase, but have separate databases. They also share the same server when deployed to production. To make this approach maintainable, you will need to implement some way of sharing common configuration across the sites. The “Config Split” and “Features” modules are two popular options. This approach works well for maybe up to a dozen sites, but will likely not scale well for installations with hundreds of sites. Provisioning new sites will be a manual process unless you write your own custom tooling.

As most OpenScholar installations have hundreds or thousands of sites, we would typically recommend a more robust solution like Acquia Site Factory or Pantheon Upstreams.

Technical Challenges of Migrating OpenScholar content

Moving on, we wanted to share some of the challenges we encountered and our lessons learned from an OpenScholar to Drupal 9 migration project for Princeton University. 

General Migration Challenges

Migrating website content is almost never as simple as copying data from “column A” to “column B”. To highlight a couple of these challenges: Link paths on Drupal may contain unique IDs, such as “/node/92382” or “/events?category=32”. These URLs will not work if directly migrated “as is”. Transformations often need to be performed to look up the equivalent ID on the new platform and replace it. 

Unstructured content, such as markup in WYSIWYG fields can be especially problematic, as it can contain inline styles, embed tokens, and all kinds of other quirky things, which will cause issues on the new platform and will need to be dealt with. 

You should plan and budget for multiple cycles of scripting, testing, and fixing as you work on your migration scripts. A developer might knock out a first draft of a migration script fairly quickly, but it is extremely rare to get it perfect the first time around. It is very common to find large numbers of edge cases and gaps the migration script did not account for as you start running it on hundreds of pages across hundreds of sites.

Migration patterns

Migrations at Scale

When migrating a single site, you may opt to manually re-enter content if a content type has only 20-30 items rather than spend time automating it with a script. Or you may choose to manually fix defects you encounter in the migrated content. However, these types of manual fixes can quickly become untenable when migrating on hundreds of sites. For example, if you are migrating 1,000 sites and spend only 2 hours on manual review, cleanup, and content entry per site, it will take one employee working full time an entire year to get through the stack of work.  

While a small amount of manual review and cleanup is unavoidable, automating as much as humanly possible is a necessity on migrations of this scale.

OpenScholar-Specific Challenges

OpenScholar has a unique “virtual site” architecture that overrides a lot of Drupal 7’s default behaviors. While each site on an OpenScholar distribution appears to be separate and independent to site visitors, in reality they all share a single Drupal codebase, database, and filesystem. This is achieved through a collection of Drupal community modules such as “Organic Groups” and “Spaces” in combination with a large number of custom modules unique to OpenScholar. This underlying architecture creates additional complications to the usual challenges of migrating a Drupal 7 website. 

OpenScholar stores default configurations and content directly in code and only writes a record to the database once it has been overwritten on a particular site. This makes migration scripting difficult, as much of the content and configuration you wish to migrate won’t appear in the database. To work around this issue, you may need to write custom logic that will detect whether or not an overwritten record exists in the database, and provide a hard-coded default to use as a fallback.

The heavy modifications which OpenScholar makes to Drupal 7 breaks compatibility with most of the Drupal 7 source plugins that come bundled with the Migrate API in Drupal 9. Because there is a lack of detailed documentation on OpenScholar’s underlying architecture, be prepared to undertake a significant degree of reverse-engineering to understand how data is stored in the database in order to write migration scripts. Even if your development team has prior experience with Drupal migrations, expect a lot of additional surprises and challenges. Here be dragons.

Communication

The human challenges of a migration project are something that often gets overlooked. Stakeholder management is also a huge challenge when you are migrating hundreds of sites. You may need to communicate with hundreds of site owners to confirm migration timing, gather review feedback, acquire approval, and schedule launch of their respective websites. Having a solid and sustainable process for all of this communication is crucial. 

Conclusion

FFW has helped Princeton University migrate hundreds of OpenScholar to Drupal 9, and helped countless other clients with large scale migrations from Drupal 7 to other platforms. When you’re ready to get started or learn more we’re here to help!

Correction: Support for PHP 8 was added to Drupal 7 with release 7.79. However, it is unknown if the custom modules in the OpenScholar distribution are fully compatible with PHP 8.