The challenge

Objective:

Extract, store and automate legal data collection from websites, quickly and securely.

Outcome:

A cloud-based web scraping platform enabling our client to efficiently scale their global ESG data collection from publicly available websites.

Results that matter:

Fast, secure and legal access to a wide range of ESG-specific data to improve insights into investment decisions.

How to extract valuable data without wasting valuable time

Asset managers make important investment decisions for their clients, based largely on insights they gain from readily accessible market data. Their investment teams wanted to improve their insights by widening their data sets to include more direct website data. We developed a web scraping platform that would allow them to efficiently extract, store and analyse a vast range of changing data available in the public domain.

As a global asset manager with a forward-looking, active investment approach they wanted to integrate sustainability research into their investment decisions, using publicly available information from the web for their environmental, social and governance (ESG) analysis.

A wide variety of investment teams needed to harvest and process publicly available data from a range of websites, quickly and securely. The system had to cater for the needs of different teams and users across the globe, each with their own desired data sources and requirements.

We were appointed to develop and implement a framework and technical platform that would allow the business to perform self-service web scraping, in order to assist in delivering ESG analysis.

A data harvesting platform for thousands of secure web scrapers

We created a cloud-based platform that is fully scalable, theoretically with no limit on scrapers, to help extract specific content and data from publicly available information on websites, either on demand or scheduled.

The platform was developed within a framework of strong governance, control and security. It includes role-based user management controls and the secure, long-term storage of data.

Through APIs and the software 6point6 developed, asset managers can run a script that scrapes a website, unpacks HTML, cleans the data and turns it into a tidy package of information that can be analysed.

It was very important that the legal terms and conditions for each website were fully complied with and stored. Each time a scraper is activated, the legal team as well as the site’s compliance team are alerted to approve the use of the site. This ensures data is accessed in a compliant, legal and secure way.

New functionality for cost monitoring was developed in addition to the original project scope. This provided a clear view of data scraping costs, allowing for full management and control.

Full product, training and handover delivered in under five months

From our full analysis of the client’s requirements and discussions with stakeholders, we built a roadmap for effective delivery and targeted key technical challenge areas early on.

By using a Scrum methodology and running regular demonstrations to the team, we were able to build an MVP for the first web scraping requirements within six weeks.

We also built and managed a product backlog to support future work, including multi-tenancy and future integration with knowledge graphs.

Greater access to data enables better informed decisions

The web scraping platform enables users to harvest relevant data, on demand or on schedule, based on custom scripts or code. The number of web scrapers they can manage is practically unlimited, running to the hundreds of thousands if necessary.

Having fast access to accurate, reliable information means better investment decisions for clients – and, ultimately, the success and growth of the business. Legal and compliance approval ensures our client maintains the high ethical standards their clients expect and trust.

We have subsequently engaged to help craft a full data enablement strategy allowing them to be future-proofed and best-placed for all their data needs.

Our thinking

How to use better data to build the best client relationships

Accelerate data science: why a graph should be at the heart of your data platform

ESG investing, data & Graph

Operational Resilience – time to act