Skip to main content

Build an Object Recognizer with React and Sign in with Google

Introduction

In this tutorial, we'll guide you through building an object recognizer in React, authenticated with Google. Try it out yourself, or check out complete source code on GitHub

Object recognizer

Prerequisite

Before getting started, ensure you have:

  • A Google account.
  • An OpenAI developer account with credits.

Setup Sign in with Google

First let's create a Google Cloud project. Head to Google Cloud console. Click on the project selector on top and click New Project.

Head to the OAuth consent screen page and configure the consent screen. Choose External for user type.

OAuth consent screen

Enter an app name and click Next

OAuth consent screen

In Scopes, add the openid scope, as well as ../auth/userinfo.email and ../auth/userinfo.profile

OAuth consent screen

Add test users as necessary

OAuth consent screen

The consent screen is now configured.

OAuth consent screen

Head to the Credentials page and click Create Credentials. Choose OAuth client ID.

Create OAuth client ID

In the page, choose Web Application as the application type, and give it a name. for your client ID. Add the dev host to the authorized origins. In Vite it should be http://localhost:5173

Create OAuth client ID

You are done. Take a note of the Client ID. You will need this to configure both ServerlessAI project and React.

Create OAuth client ID

Setting Up Your ServerlessAI Project

If you do not have a ServerlessAI account, or want to create a new project
  1. Register for ServerlessAI
    Head to ServerlessAI and sign up. Once registered, follow the wizard to create a new project.

    If you have already created an account and want to create a new project, Click "Create project" in the project selector dropdown on upper left corner.

    ServerlessAI project creation

  2. Add the OpenAI Provider
    Add OpenAI as a provider by entering your OpenAI API key.

    ServerlessAI provider setup

  3. Enable Chat Completion Capability
    Enable the chat completion capability

    ServerlessAI capabilities

  4. Configure Authentication
    Choose "JWKS" as the JWT key type and enter https://www.googleapis.com/oauth2/v3/certs as the URL. Enter your Google client ID as audience.

    Note: you must set the audience parameter. Otherwise an attacker can use tokens issued for another apps to authenticate with your service!

    ServerlessAI authentication setup

  5. Set Quota (Optional)
    You can configure quotas for your project or leave them blank for unlimited use since this is not a production app.

    ServerlessAI quota configuration

  6. Complete Setup
    After completing the setup, take note of your project ID. ServerlessAI finish

If you want to use an existing ServerlessAI project

If you've already created a project, that's great! Let's ensure it’s properly configured for this tutorial.

Navigate to the ServerlessAI Settings page.

  1. JWT Configuration

    • Under Authentication - JWT, select JWKS as the JWT type.
    • Enter https://www.googleapis.com/oauth2/v3/certs as URL
    • Enter your Google client ID as the audience

    ServerlessAI JWT settings

  2. Disable Guest User

    • Go to Users - User Groups - Guest.
    • Ensure the guest user is disabled.

    ServerlessAI guest user settings

  3. Enable Chat Completion Capability

    • In Capabilities - Chat Completion, ensure the capability is enabled.

    ServerlessAI chat completion settings

    • Next, enable this capability for user groups:
      • Go to Users - Quota, select Authenticated Users, and enable the Chat Completion capability.
      • Optionally, set a usage quota.

    ServerlessAI quota

  4. Configure OpenAI API Key

    • Under Providers - OpenAI, enter your OpenAI API key.

    ServerlessAI OpenAI settings

Scaffold your project

We will scaffold the project using Vite's React template. In Terminal, run

npm create vite react-google-vision --template react
cd react-google-vision

For simplicity, we will use @react-oauth/google to implement Sign in with Google

npm install --save @react-oauth/google@latest

Add Google Authentication

Modify src/main.jsx and wrap your <App /> with Google OAuth provider:

createRoot(document.getElementById("root")).render(
<StrictMode>
<GoogleOAuthProvider clientId="<GOOGLE CLIENT ID>">
<App />
</GoogleOAuthProvider>
</StrictMode>
);

Modify the App component src/App.jsx and add the Sign in with Google button. We will build the Snapshot component later.

function App() {
const [credential, setCredential] = useState();

if (!credential) {
return (
<div className="login">
<GoogleLogin
onSuccess={(credentialResponse) => {
setCredential(credentialResponse.credential);
}}
onError={() => {
console.error("Login Failed");
}}
/>
</div>
);
}

return <Snapshot credential={credential} />;
}

Build the Recognizer UI

Add a component in App.jsx named Recognizer. The recognizer can be in one of three states:

  • IDLE, when the camera is not active
  • CAPTURING, when the camera is capturing images
  • RECOGNIZING, when the object is being recognized
function Recognizer({ credential }) {
const videoRef = useRef();
const canvasRef = useRef();
const [result, setResult] = useState("");
// One of: IDLE, CAPTURING, RECOGNIZING
const [state, setState] = useState("IDLE");

const startCapture = () => {};
const stopCapture = () => {};
const restartCapture = () => {};
const recognize = () => {};

return (
<div>
<div>
{state === "IDLE" && (
<button onClick={startCapture}>Start Capture</button>
)}
{state === "CAPTURING" && (
<button onClick={stopCapture}>Stop Capture</button>
)}
{state === "CAPTURING" && (
<button onClick={recognize}>Recognize Object</button>
)}
{state === "RECOGNIZING" && (
<button onClick={restartCapture}>Restart Capture</button>
)}
</div>
<video
ref={videoRef}
style={{
display: state === "CAPTURING" ? "block" : "none",
}}
/>
<canvas
ref={canvasRef}
style={{
display: state === "RECOGNIZING" ? "block" : "none",
}}
/>
{result && <p>Result: {result}</p>}
</div>
);
}

Now, let's implement the startCapture, stopCapture, and restartCapture functions

const startCapture = async () => {
const stream = await navigator.mediaDevices.getUserMedia({ video: true });
videoRef.current.srcObject = stream;
videoRef.current.play();
setState("CAPTURING");
};

const stopCapture = () => {
videoRef.current.srcObject.getVideoTracks().forEach((track) => {
track.stop();
});
setState("IDLE");
};

const restartCapture = () => {
setResult("");
startCapture();
};

Recognize the object

To recognize the object, we will take a snapshot of the webcam stream, and send the image to a multimodal LLM model. In this example, we will send the request to gpt-4o-mini.

First let's add the OpenAI client. Inside the Recognizer component, add the following code. Note that we change the base URL to ServerlessAI's API, and pass in the ServerlessAI project key and the user's JWT token as the API key. We also need to enable dangerouslyAllowBrowser, which is no longer dangerous.

const PROJECT_ID = "YOUR_SERVERLESSAI_PROJECT_ID";

const openai = useMemo(
() =>
new OpenAI({
baseURL: "https://openai.api.serverlessai.dev/v1",
apiKey: `${PROJECT_ID}:${credential}`,
dangerouslyAllowBrowser: true,
}),
[]
);

Now, let's implement the recognize function:

const recognize = async () => {
setState("RECOGNIZING");
const video = videoRef.current;
const canvas = canvasRef.current;
canvas.width = video.videoWidth;
canvas.height = video.videoHeight;
canvas
.getContext("2d")
.drawImage(video, 0, 0, video.videoWidth, video.videoHeight);
const image = canvas.toDataURL("image/jpeg", 0.95);

videoRef.current.srcObject.getVideoTracks().forEach((track) => {
track.stop();
});

const result = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [
{
role: "user",
content: [
{
type: "text",
text: "What is this? Be concise in your response. Do not include any extra information.",
},
{
type: "image_url",
image_url: {
url: image,
},
},
],
},
],
});

setResult(result.choices[0].message.content);
};

View complete source

Style your application

Now let's add a touch of CSS to make the application look good. Modify index.css with the following

:root {
font-family: system-ui, Avenir, Helvetica, Arial, sans-serif;
line-height: 1.5;
font-weight: 400;

font-synthesis: none;
text-rendering: optimizeLegibility;
-webkit-font-smoothing: antialiased;
-moz-osx-font-smoothing: grayscale;
}


button {
border-radius: 8px;
border: 1px solid transparent;
background-color: #f9f9f9;
padding: 0.6em 1.2em;
font-size: 1em;
font-weight: 500;
font-family: inherit;
cursor: pointer;
transition: border-color 0.25s;
}
button:hover {
border-color: #646cff;
}
button:focus,
button:focus-visible {
outline: 4px auto -webkit-focus-ring-color;
}

.login {
display: flex;
justify-content: center;
padding: 2rem 6rem;
border: 1px solid #eaeaea;
border-radius: 8px;
max-width: 400px;
margin: 0 auto;
}

Running your application

Now you can run your application! In Terminal, run

npm run dev

and you can start recognizing objects!

Object recognizer