Shing Lyu

Disclaimer: This content reflects my personal opinions, not those of any organizations I am or have been affiliated with. Code samples are provided for illustration purposes only, use with caution and test thoroughly before deployment.

How Many Models Can You Fit into a SageMaker Multi-Model Endpoint?

Recently, I’ve been working on a project that requires running thousands of models simultaneously. To save costs, we decided to run it on a SageMaker Multi-Model endpoint.

Here is the official definition of Multi-Modal Endpoint from the official AWS Documentation:

Multi-model endpoints provide a scalable and cost-effective solution to deploying large numbers of models. They use the same fleet of resources and a shared serving container to host all of your models. This reduces hosting costs by improving endpoint utilization compared with using single-model endpoints. It also reduces deployment overhead because Amazon SageMaker manages loading models in memory and scaling them based on the traffic patterns to your endpoint.

Some example use cases include:

These use cases have many models, with the same model algorithm and framework but trained on different dataset.

A key questions aries: “How many models can we fit into one instance, and what instance type do we need?”. This post demonstrates my experiment results to answer this quesetion.

(continue reading...)


Demystifying The Options for Triggering AWS CodePipeline with Amazon S3 Events

Triggering AWS code pipeline when new files are uploaded to S3 is a very common use case. For example, when new data is uploaded, you can trigger a CodePipeline that triggers SageMaker model retraining or inference. However, the documentation and services involved in this process have gone through multiple updates, making it confusing for users to understand the current recommended approach. In this post, I will try to untangle the different options and let you know which one is the most up-to-date and recommended approach.

(continue reading...)


Turning Hand-Drawn Architecture Diagrams into Digital Diagrams with Generative AI

Background

As an architect, whiteboarding is an essential part of the design and problem-solving process. However, once the whiteboard session is over, it can be challenging to translate those hand-drawn diagrams into a digital format that can be easily shared and collaborated on. Traditionally, this process involved manually recreating the diagrams using diagramming tools, which can be time-consuming and prone to errors.

My Approach with Generative AI

To streamline this process, I’ve adopted a new approach that leverages the power of generative AI. By combining a multi-model large language model (LLM) from Amazon Bedrock with the versatile diagramming tool draw.io, I can effortlessly convert hand-drawn architecture diagrams into digital diagrams.

(continue reading...)


Streamline my local transcription command for Raycast

In my previous blog, I asked Raycast to create an iTerm2 terminal to run the transcription script. This was because I needed to press CTRL+C to stop the sox recording. However, I found that launching iTerm2 and zsh took a significant amount of time, sometimes up to 5 to 10 seconds, which was too slow for quickly jotting down thoughts.

To address this issue, I discovered a way to stop the sox recording without opening the iTerm terminal. Instead of using CTRL+C, I can send a kill signal to the sox process. To achieve this, I utilize some shell script tricks to save the PID (Process ID) of sox and display a macOS pop-up. When I press the pop-up button, it sends a kill signal to the sox process, effectively stopping the recording.

(continue reading...)


Transcribe voice to text locally with Whisper.cpp and Raycast

Why local transcription?

In today’s world, many of us rely on cloud-based services for tasks like speech-to-text transcription. While these services are convenient, they come with drawbacks like high latency due to network transfers and potential privacy concerns as our audio data is sent to the cloud.

Local transcription addresses these issues by performing the speech recognition process entirely on your device, ensuring low latency and keeping your audio data private. This is where whisper.cpp comes in.

(continue reading...)