Build an Object Recognizer with React and Sign in with Google

Introduction

In this tutorial, we'll guide you through building an object recognizer in React, authenticated with Google. Try it out yourself, or check out complete source code on GitHub

Object recognizer

Prerequisite

Before getting started, ensure you have:

A Google account.
An OpenAI developer account with credits.

First let's create a Google Cloud project. Head to Google Cloud console. Click on the project selector on top and click New Project.

Head to the OAuth consent screen page and configure the consent screen. Choose External for user type.

OAuth consent screen

Enter an app name and click Next

OAuth consent screen

In Scopes, add the openid scope, as well as ../auth/userinfo.email and ../auth/userinfo.profile

OAuth consent screen

Add test users as necessary

OAuth consent screen

The consent screen is now configured.

OAuth consent screen

Head to the Credentials page and click Create Credentials. Choose OAuth client ID.

Create OAuth client ID

In the page, choose Web Application as the application type, and give it a name. for your client ID. Add the dev host to the authorized origins. In Vite it should be http://localhost:5173

Create OAuth client ID

You are done. Take a note of the Client ID. You will need this to configure both ServerlessAI project and React.

Create OAuth client ID

Setting Up Your ServerlessAI Project

If you do not have a ServerlessAI account, or want to create a new project

Register for ServerlessAI
Head to ServerlessAI and sign up. Once registered, follow the wizard to create a new project.

If you have already created an account and want to create a new project, Click "Create project" in the project selector dropdown on upper left corner.
Add the OpenAI Provider
Add OpenAI as a provider by entering your OpenAI API key.
Enable Chat Completion Capability
Enable the chat completion capability
Configure Authentication
Choose "JWKS" as the JWT key type and enter https://www.googleapis.com/oauth2/v3/certs as the URL. Enter your Google client ID as audience.

Note: you must set the audience parameter. Otherwise an attacker can use tokens issued for another apps to authenticate with your service!
Set Quota (Optional)
You can configure quotas for your project or leave them blank for unlimited use since this is not a production app.
Complete Setup
After completing the setup, take note of your project ID.

If you want to use an existing ServerlessAI project

If you've already created a project, that's great! Let's ensure it’s properly configured for this tutorial.

Navigate to the ServerlessAI Settings page.

JWT Configuration
- Under Authentication - JWT, select JWKS as the JWT type.
- Enter https://www.googleapis.com/oauth2/v3/certs as URL
- Enter your Google client ID as the audience
Disable Guest User
- Go to Users - User Groups - Guest.
- Ensure the guest user is disabled.
Enable Chat Completion Capability
- In Capabilities - Chat Completion, ensure the capability is enabled.
- Next, enable this capability for user groups:
  - Go to Users - Quota, select Authenticated Users, and enable the Chat Completion capability.
  - Optionally, set a usage quota.
Configure OpenAI API Key
- Under Providers - OpenAI, enter your OpenAI API key.

Scaffold your project

We will scaffold the project using Vite's React template. In Terminal, run

npm create vite react-google-vision --template react
cd react-google-vision

For simplicity, we will use @react-oauth/google to implement Sign in with Google

npm install --save @react-oauth/google@latest

Add Google Authentication

Modify src/main.jsx and wrap your <App /> with Google OAuth provider:

createRoot(document.getElementById("root")).render(
  <StrictMode>
    <GoogleOAuthProvider clientId="<GOOGLE CLIENT ID>">
      <App />
    </GoogleOAuthProvider>
  </StrictMode>
);

Modify the App component src/App.jsx and add the Sign in with Google button. We will build the Snapshot component later.

function App() {
  const [credential, setCredential] = useState();

  if (!credential) {
    return (
      <div className="login">
        <GoogleLogin
          onSuccess={(credentialResponse) => {
            setCredential(credentialResponse.credential);
          }}
          onError={() => {
            console.error("Login Failed");
          }}
        />
      </div>
    );
  }

  return <Snapshot credential={credential} />;
}

Build the Recognizer UI

Add a component in App.jsx named Recognizer. The recognizer can be in one of three states:

IDLE, when the camera is not active
CAPTURING, when the camera is capturing images
RECOGNIZING, when the object is being recognized

function Recognizer({ credential }) {
  const videoRef = useRef();
  const canvasRef = useRef();
  const [result, setResult] = useState("");
  // One of: IDLE, CAPTURING, RECOGNIZING
  const [state, setState] = useState("IDLE");

  const startCapture = () => {};
  const stopCapture = () => {};
  const restartCapture = () => {};
  const recognize = () => {};

  return (
    <div>
      <div>
        {state === "IDLE" && (
          <button onClick={startCapture}>Start Capture</button>
        )}
        {state === "CAPTURING" && (
          <button onClick={stopCapture}>Stop Capture</button>
        )}
        {state === "CAPTURING" && (
          <button onClick={recognize}>Recognize Object</button>
        )}
        {state === "RECOGNIZING" && (
          <button onClick={restartCapture}>Restart Capture</button>
        )}
      </div>
      <video
        ref={videoRef}
        style={{
          display: state === "CAPTURING" ? "block" : "none",
        }}
      />
      <canvas
        ref={canvasRef}
        style={{
          display: state === "RECOGNIZING" ? "block" : "none",
        }}
      />
      {result && <p>Result: {result}</p>}
    </div>
  );
}

Now, let's implement the startCapture, stopCapture, and restartCapture functions

const startCapture = async () => {
  const stream = await navigator.mediaDevices.getUserMedia({ video: true });
  videoRef.current.srcObject = stream;
  videoRef.current.play();
  setState("CAPTURING");
};

const stopCapture = () => {
  videoRef.current.srcObject.getVideoTracks().forEach((track) => {
    track.stop();
  });
  setState("IDLE");
};

const restartCapture = () => {
  setResult("");
  startCapture();
};

Recognize the object

To recognize the object, we will take a snapshot of the webcam stream, and send the image to a multimodal LLM model. In this example, we will send the request to gpt-4o-mini.

First let's add the OpenAI client. Inside the Recognizer component, add the following code. Note that we change the base URL to ServerlessAI's API, and pass in the ServerlessAI project key and the user's JWT token as the API key. We also need to enable dangerouslyAllowBrowser, which is no longer dangerous.

const PROJECT_ID = "YOUR_SERVERLESSAI_PROJECT_ID";

const openai = useMemo(
  () =>
    new OpenAI({
      baseURL: "https://openai.api.serverlessai.dev/v1",
      apiKey: `${PROJECT_ID}:${credential}`,
      dangerouslyAllowBrowser: true,
    }),
  []
);

Now, let's implement the recognize function:

const recognize = async () => {
  setState("RECOGNIZING");
  const video = videoRef.current;
  const canvas = canvasRef.current;
  canvas.width = video.videoWidth;
  canvas.height = video.videoHeight;
  canvas
    .getContext("2d")
    .drawImage(video, 0, 0, video.videoWidth, video.videoHeight);
  const image = canvas.toDataURL("image/jpeg", 0.95);

  videoRef.current.srcObject.getVideoTracks().forEach((track) => {
    track.stop();
  });

  const result = await openai.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [
      {
        role: "user",
        content: [
          {
            type: "text",
            text: "What is this? Be concise in your response. Do not include any extra information.",
          },
          {
            type: "image_url",
            image_url: {
              url: image,
            },
          },
        ],
      },
    ],
  });

  setResult(result.choices[0].message.content);
};

View complete source

Style your application

Now let's add a touch of CSS to make the application look good. Modify index.css with the following

:root {
  font-family: system-ui, Avenir, Helvetica, Arial, sans-serif;
  line-height: 1.5;
  font-weight: 400;

  font-synthesis: none;
  text-rendering: optimizeLegibility;
  -webkit-font-smoothing: antialiased;
  -moz-osx-font-smoothing: grayscale;
}


button {
  border-radius: 8px;
  border: 1px solid transparent;
  background-color: #f9f9f9;
  padding: 0.6em 1.2em;
  font-size: 1em;
  font-weight: 500;
  font-family: inherit;
  cursor: pointer;
  transition: border-color 0.25s;
}
button:hover {
  border-color: #646cff;
}
button:focus,
button:focus-visible {
  outline: 4px auto -webkit-focus-ring-color;
}

.login {
  display: flex;
  justify-content: center;
  padding: 2rem 6rem;
  border: 1px solid #eaeaea;
  border-radius: 8px;
  max-width: 400px;
  margin: 0 auto;
}

Running your application

Now you can run your application! In Terminal, run

npm run dev

and you can start recognizing objects!

Object recognizer

Introduction​

Prerequisite​

Setup Sign in with Google​

Setting Up Your ServerlessAI Project​

Scaffold your project​

Add Google Authentication​

Build the Recognizer UI​

Recognize the object​

Style your application​

Running your application​

Introduction

Prerequisite

Setup Sign in with Google

Setting Up Your ServerlessAI Project

Scaffold your project

Add Google Authentication

Build the Recognizer UI

Recognize the object

Style your application

Running your application