Image Processing#
There are also open-source models available to run image processing. One such model called BakLLaVa is described in more detail here: SkunkworksAI/BakLLaVA. You can download it from Hugging Face here: https://huggingface.co/llava-hf/bakLlava-v1-hf
Toy Marketing Study#
You can run this model using both transformers and llama_cpp_python. We showcase how to deploy it with transformers below in the context of a toy marketing study.
For the toy study, we asked DALL-E to produce 3 soft drink ads for us:
1. with cute kawaii characters
2. with people at a Cubs game enjoying the drink
3. from a hopeless dystopian future, where the soft drink looks like it might taste good
This is what it provided:

Then we gave the model the following prompt: “You are a young adult between age 25 and 30 taking a marketing survey. Can you describe if this soft drink ad appeals to you?”
Run with Transformers#
For this study, we’ll run the model with Transformers using the code below.
###################################################
# Toy Marketing Study - Does this ad appeal to you?
###################################################
# libraries
import torch
from transformers import AutoProcessor, LlavaForConditionalGeneration
from PIL import Image
import pandas as pd
import os
import time
#########
# Inputs
#########
llm_model = "llava-hf/bakLlava-v1-hf"
llm_dir = "/kellogg/data/llm_models_opensource/bakLlava"
prompt = """
USER: <image>\nYou are a young adult between age 25 and 30 taking a marketing survey.
Can you describe if this soft drink ad appeals to you?
\nASSISTANT:
"""
output_file = "/kellogg/software/llama_cpp/output/ad_results.csv"
ad_dir = "/kellogg/software/llama_cpp/code/ads"
############
# Functions
############
# load model and processor
def load_model(llm_model, llm_dir):
model = LlavaForConditionalGeneration.from_pretrained(
llm_model,
torch_dtype=torch.float16,
low_cpu_mem_usage=True,
cache_dir=llm_dir,
).to(0)
processor = AutoProcessor.from_pretrained(llm_model)
return model, processor
# run bakllava
def run_bakllava(model, processor, image_file, prompt):
# open image
raw_image = Image.open(image_file)
# process image
inputs = processor(prompt, raw_image, return_tensors='pt').to(0, torch.float16)
# generate response
output = model.generate(**inputs, max_new_tokens=400, do_sample=False)
output = processor.decode(output[0][2:], skip_special_tokens=True)
return output
# save results to a df
def save_results(prompt, image_file, response, run_time):
# create empty df
results_df = pd.DataFrame(columns=['prompt', 'image_file', 'response', 'run_time'])
# create df from current row
row_df = pd.DataFrame({
'prompt': [prompt],
'image_file': [image_file],
'response': [response],
'run_time': [run_time]
})
# combine
results_df = pd.concat([results_df, row_df], ignore_index=True)
# return dataframe
return results_df
######
# RUN
######
def main():
# list of files from a directory
ads = [os.path.join(ad_dir, f) for f in os.listdir(ad_dir) if os.path.isfile(os.path.join(ad_dir, f))]
# load model
model, processor = load_model(llm_model, llm_dir)
# loop over
for ad in ads:
# run
start_time = time.time()
response = run_bakllava(model, processor, ad, prompt)
run_time = time.time() - start_time
# print results
print("========================")
print(f"Ad: {ad}")
print(f"Response: {response}")
print(f"Run Time: {run_time}")
print("========================")
# save progress
results_df = save_results(prompt, ad, response, run_time)
results_df.to_csv(output_file, index=False)
if __name__ == "__main__":
main()
This python image_workflow.py script can be found here: scripts/image/image_workflow.py. This sample SLURM script can be used to run the python code provided.
#!/bin/bash
#SBATCH --account=e32337
#SBATCH --partition gengpu
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --gres=gpu:a100:1
#SBATCH --time 0:30:00
#SBATCH --mem=40G
module purge
module load mamba/23.1.0
source /hpc/software/mamba/23.1.0/etc/profile.d/conda.sh
source activate /kellogg/software/envs/gpu-llama2
python image_workflow.py
Output:
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 4.70it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
========================
Ad: /kellogg/software/llama_cpp/code/ads/dystopia.png
Response:
USER:
You are a young adult between age 25 and 30 taking a marketing survey.
Can you describe if this soft drink ad appeals to you?
ASSISTANT:
The image features a red can of Coca-Cola in a visually appealing setting. It is placed in front of a futuristic city landscape, possibly on a beach or near a building. The contrast between the red can and the surrounding environment creates a striking visual effect. The scene suggests that the soft drink brand is targeting a young adult audience, as the product appears "cool" and "futuristic" in the context of the ad. The ad's creative concept and design may capture the attention of the target audience and generate interest in the product.
Run Time: 3.995788097381592
========================
========================
Ad: /kellogg/software/llama_cpp/code/ads/kawaii.png
Response:
USER:
You are a young adult between age 25 and 30 taking a marketing survey.
Can you describe if this soft drink ad appeals to you?
ASSISTANT:
The soft drink ad features a pink and white can with a cute cat and a small heart that says "Coco Cola." A straw is sticking out of the can. The cat appears to be a popular emo symbol, which might appeal to young adults. The ad's design is visually appealing and playful, which could make the soft drink more attractive to the target audience. The use of a cute and quirky image can create a positive association with the product, potentially increasing its popularity and sales among the desired demographic.
Run Time: 3.008307695388794
========================
========================
Ad: /kellogg/software/llama_cpp/code/ads/cubs.png
Response:
USER:
You are a young adult between age 25 and 30 taking a marketing survey.
Can you describe if this soft drink ad appeals to you?
ASSISTANT:
The ad features a glass filled with a soft drink and a can of the same soft drink. The glass has ice in it, and the can has a straw. The ad also includes several people in the background, suggesting a social setting where people are enjoying the soft drink. The ad's visual appeal is enhanced by the presence of people, creating a sense of community and enjoyment associated with the soft drink. The combination of the glass with ice and the can with a straw presents the soft drink as a refreshing beverage that can be enjoyed in various settings, whether it's a casual gathering or a sports event. As a young adult, this ad might appeal to me because it portrays a lifestyle that I can relate to and shows the soft drink as a suitable drink for social occasions.
Run Time: 4.399393796920776
========================