Data Extraction With Node.js: Designing Serverless Pipelines In Azure For PDF Data Extraction
Wednesday, January 18, 2023 - 7:00 PM UTC, for 1 hour.
AT THAT (In-Person Only) Regular, 60 minute presentation
Room: Campsite 5
Node.js is often a choice for the back-end. Node.js has so many uses, from scraping to simple REST API facilitation; however, the increase in complexity of modern system requirements and the decrease in contributors to open-source tooling has meant much of the language's open-source tooling has become lacking, unstable, or bug-ridden, making simple technical tasks unnecessarily complicated. Several months ago, I set out on a journey to extract data from forms by designing an in-house solution (there are a lot of "off-the-shelf" solutions available on the web). Being a Node.js "expert" - and not wanting to dedicate myself to too much learning - I chose Node.js as the language for my solution. I started with the question: how little do I need to know to read data from a form? When attempting to design what felt like a simple solution, I began bumping up against the limitations of serverless computing across major cloud providers, Node.js, and its tooling. The answer to "how much" became "a surprising lot". If Azure PaaS offerings in the computer vision space - specifically OCR - are of interest to you, then this session should expose some of the questions you will need to answer when designing serverless solutions. Join me as I walk you through my journey to keep it simple.
Some familiarity with Node.js, serverless computing services in the cloud in general, and scraping would be helpful, but none are required.
- How to process extraction results from Azure's Computer Vision OCR services.
- How to stand up a REST API using Azure Function Applications
- How to handle file uploads using Node.js using Azure Function Applications
- An understanding of serverless API gateway service limitations in the modern cloud.