Find Legitimate Emails in your Gmail Spam Folder with AI and Google Script

June 07, 2024

Find Legitimate Emails in your Gmail Spam Folder with AI and Google Script

False positives in Gmail are uncommon but can happen, meaning an important email might mistakenly end up in your spam folder. When you’re dealing with hundreds of spam messages daily, identifying these legitimate emails becomes even more challenging.

You can create filters in Gmail such that emails from specific senders or with certain keywords are never marked as spam. But these filters would obviously not work for emails from new or unknown senders.

Find incorrectly classified messages in Gmail Spam

What if we used AI to analyze our spam emails in Gmail and predict which ones are likely false positives? With this list of misclassified emails, we could automatically move these emails to the inbox or generate a report for manual review.

Here’s a sample report generated from Gmail. It includes a list of emails with a low spam score that are likely legitimate and should be moved to the inbox. The report also includes a summary of the email content in your preferred language.

To get started, open this Google Script and make a copy of it in your Google Drive. Switch to the Apps Script editor and provide your email address, OpenAI API key, and preferred language for the email summary.

Choose the reportFalsePositives function from the dropdown and click the play button to run the script. It will search for unread spam emails in your Gmail account, analyze them using OpenAI’s API, and send you a report of emails with a low spam score.

If you would like to run this script automatically at regular intervals, go to the “Triggers” menu in the Google Apps Script editor and set up a time-driven trigger to run this script once every day as shown below. You can also choose the time of the day when you wish to receive the report.

How AI Spam Classification Works - The Technical Part

If you are curious to know how the script works, here is a brief overview:

The Gmail Script uses the Gmail API to search for unread spam emails in your Gmail account. It then sends the email content to OpenAI’s API to classify the spam score and generate a summary in your preferred language. Emails with a low spam score are likely false positives and can be moved to the inbox.

1. User Configuration

You can provide your email address where the report should be sent, your OpenAI API key, your preferred LLM model, and the language for the email summary.

// Basic configuration
const USER_EMAIL = 'email@domain.com'; // Email address to send the report to
const OPENAI_API_KEY = 'sk-proj-123'; // API key for OpenAI
const OPENAI_MODEL = 'gpt-4o'; // Model name to use with OpenAI
const USER_LANGUAGE = 'English'; // Language for the email summary

2. Find Unread Emails in Gmail Spam Folder

We use the epoch time to find spam emails that arrived in the last 24 hours and are still unread.

const HOURS_AGO = 24; // Time frame to search for emails (in hours)
const MAX_THREADS = 25; // Maximum number of email threads to process

const getSpamThreads_ = () => {
  const epoch = (date) => Math.floor(date.getTime() / 1000);
  const beforeDate = new Date();
  const afterDate = new Date();
  afterDate.setHours(afterDate.getHours() - HOURS_AGO);
  const searchQuery = `is:unread in:spam after:${epoch(afterDate)} before:${epoch(beforeDate)}`;
  return GmailApp.search(searchQuery, 0, MAX_THREADS);
};

3. Create a Prompt for the OpenAI Model

We create a prompt for the OpenAI model using the email message. The prompt asks the AI model to analyze the email content and assign a spam score on a scale from 0 to 10. The response should be in JSON format.

const SYSTEM_PROMPT = `You are an AI email classifier. Given the content of an email, analyze it and assign a spam score on a scale from 0 to 10, where 0 indicates a legitimate email and 10 indicates a definite spam email. Provide a short summary of the email in ${USER_LANGUAGE}. Your response should be in JSON format.`;

const MAX_BODY_LENGTH = 200; // Maximum length of email body to include in the AI prompt

const getMessagePrompt_ = (message) => {
  const body = message
    .getPlainBody()
    .replace(/https?:\/\/[^\s>]+/g, '')
    .replace(/[\n\r\t]/g, ' ')
    .replace(/\s+/g, ' ')
    .trim(); // remove all URLs, and whitespace characters
  return [
    `Subject: ${message.getSubject()}`,
    `Sender: ${message.getFrom()}`,
    `Body: ${body.substring(0, MAX_BODY_LENGTH)}`,
  ].join('\n');
};

4. Call the OpenAI API to get the Spam Score

We pass the message prompt to the OpenAI API and get the spam score and a summary of the email content. The spam score is used to determine if the email is a false positive.

The tokens variable keeps track of the number of tokens used in the OpenAI API calls and is included in the email report. You can use this information to monitor your API usage.

let tokens = 0;

const getMessageScore_ = (messagePrompt) => {
  const apiUrl = `https://api.openai.com/v1/chat/completions`;
  const headers = {
    'Content-Type': 'application/json',
    Authorization: `Bearer ${OPENAI_API_KEY}`,
  };
  const response = UrlFetchApp.fetch(apiUrl, {
    method: 'POST',
    headers,
    payload: JSON.stringify({
      model: OPENAI_MODEL,
      messages: [
        { role: 'system', content: SYSTEM_PROMPT },
        { role: 'user', content: messagePrompt },
      ],
      temperature: 0.2,
      max_tokens: 124,
      response_format: { type: 'json_object' },
    }),
  });
  const data = JSON.parse(response.getContentText());
  tokens += data.usage.total_tokens;
  const content = JSON.parse(data.choices[0].message.content);
  return content;
};

5. Process Spam Emails and email the Report

You can run this Google script manually or set up a cron trigger to run it automatically at regular intervals. It marks the spam emails as read so they aren’t processed again.

const SPAM_THRESHOLD = 2; // Threshold for spam score to include in the report

const reportFalsePositives = () => {
  const html = [];
  const threads = getSpamThreads_();
  for (let i = 0; i < threads.length; i += 1) {
    const [message] = threads[i].getMessages();
    const messagePrompt = getMessagePrompt_(message);
    // Get the spam score and summary from OpenAI
    const { spam_score, summary } = getMessageScore_(messagePrompt);
    if (spam_score <= SPAM_THRESHOLD) {
      // Add email message to the report if the spam score is below the threshold
      html.push(`<tr><td>${message.getFrom()}</td> <td>${summary}</td></tr>`);
    }
  }
  threads.forEach((thread) => thread.markRead()); // Mark all processed emails as read
  if (html.length > 0) {
    const htmlBody = [
      `<table border="1">`,
      '<tr><th>Email Sender</th><th>Summary</th></tr>',
      html.join(''),
      '</table>',
    ].join('');
    const subject = `Gmail Spam Report - ${tokens} tokens used`;
    GmailApp.sendEmail(USER_EMAIL, subject, '', { htmlBody });
  }
};