One of the big goals of this project is to use the power of the LLM to interpret images and create photo captions. The photo captions are then used to synthesize the day’s activities.
4.1 Setup and Initialize
The usual stuff that gets the libraries and data ready for use.
Show the code
## Standard Packageslibrary(tidyverse) ## Lots of useful stufflibrary(gt) ## Make Tableslibrary(ggplot2) ## Create chartslibrary(devtools) ## Load packages from GitHub## Specialized librarieslibrary(httr) ## Send requests and receive responseslibrary(jsonlite) ## Handle the request formattinglibrary(pdftools) ## Handle PDF fileslibrary(base64enc) ## Convert image files to base64 encodinglibrary(googledrive) ## Download Google Doc fileslibrary(curl) ## The force behind the httr functions## Get the accessOAI package (do this just once)## install_github("kimbridges/accessOAI")## Initialize the accessOAI librarylibrary(accessOAI)## Initialize some things that rarely change.LLM <-"gpt-4o"LLM_alt <-"gpt-3.5-turbo"## (text only)temp <-1apiKey <-Sys.getenv("OPENAI_API_KEY")## Get Baseinfo (originates in photos chapter)baseinfo <-read.table("baseinfo.txt")source <- baseinfo$sourcefolder <- baseinfo$folderthumbs_folder <- baseinfo$thumbs_folderfiles_folder <- baseinfo$files_folder
4.2 Create Photo Captions
Here’s where we’re using the LLM to interpret each photo.
Show the code
## Setup the request for the LLM.prompt <-"The input file in a photo taken during a trip as part of a set of photos documenting the day's activities. Each photo attempts to record either an activity or a location or both. Please create a caption for the photo. Don't include any comments in your response. Here is the location and time of the photo: "## Define the LLM role.role <-"You are a photo interpretation expert. You know how to extract information like the photo location, things in the scene, the overall weather or ambiance of the scene and other relevant facts. You try to limit your responses to things in which you have confidence and avoid speculation. If provided with a time, please convert it to a 12-hour clock and use this in your response."## Get information about the photos.file_location <-paste0(files_folder,"/photo_info.txt")data <-read.table(file = file_location)## Initializing the new element in the data frame.data$caption <-""## Find out how many photos.n_photos <-nrow(data)## Get photo directoryworking_dir <-getwd()## Loop through the photos.for (i in1:n_photos){## Build the photo name. photo <-paste0(working_dir,"/",thumbs_folder,"/photo_",i,".png")## Build information to be used in the prompt. hint <-paste0("location: ",data$location[i]," time: ", data$time[i])## Enhance the prompt. full_prompt <-paste0(prompt, hint)## Reset the response. response <-NULL## Do the image analysis. response <-analyzeIMG(analysis_image = photo,AI_role = role,analysis_prompt = full_prompt,LLM = LLM,apiKey = apiKey,connecttimeout =90)## Watch progress.## cat("photo",i,":",response,"\n\n")## Add the caption to the data. data$caption[i] <- response## Pause five seconds.Sys.sleep(3)} ## end photo loop## Save another temporary data file.file_location <-paste0(files_folder,"/photo_info.txt")write.table(data, file = file_location)
4.3 Captions Table
The captions can be seen in a table.
Show the code
## Get the data.file_location <-paste0(files_folder,"/photo_info.txt")data <-read.table(file = file_location)## Extract the photo number and the caption.cap_data <- data |>select(number, caption)## Make the table.gt(cap_data) |>tab_style( style ="vertical-align:top", locations =cells_body())
number
caption
1
Enjoying a Belgian waffle on a morning in Antwerp.
2
Exploring Delft by train just before noon.
3
Exploring the modern architecture of Den Haag's public spaces at 12:19 PM.
4
Exploring the streets of The Hague on a rainy afternoon at Gravenstraat.
5
Enjoying a leisurely lunch at a cozy restaurant in Den Haag.
6
Admiring Vermeer's "Girl with a Pearl Earring" at the Mauritshuis Museum, 3:15 PM.
7
Visitors admiring artwork at Plein 29 in Den Haag, Netherlands at 3:18 PM.
8
Admiring the famous "Girl with a Pearl Earring" exhibit at the Mauritshuis Museum in The Hague.
9
Visiting the Mauritshuis Museum in The Hague, Netherlands at 4:10 PM.
10
Exploring the historic Binnenhof complex in The Hague on a rainy afternoon.
11
Exploring the historic streets of Den Haag at Gravenstraat, Netherlands in the late afternoon.
12
Enjoying an early evening snack at a cozy café on Denneweg in The Hague.
13
Evening wine tasting at Denneweg 130 in Den Haag, Netherlands.
14
Enjoying a delightful dinner at Denneweg 130 in Den Haag at 6:25 PM.
15
Savoring a delightful dessert at Denneweg 130, Den Haag at 7:36 PM.
16
Dinner at Deksels Restaurant on Denneweg, Den Haag at dusk.
4.4 Synthesize the Captions
Here, the set of captions and data on the movement (time and distance) are put into a synthesized story about the day.
The result is shown in the Synthesis Chapter.
Show the code
## Get the data.file_location <-paste0(files_folder,"/photo_info.txt")data <-read.table(file = file_location)## Figure out the number of entries.n_rows <-nrow(data)## Initialize the data.text_input <-""## Create a table with structured info for each photo.for (i in1:n_rows){ line1 <-paste("Photo number",i) line2 <-paste("Location:", data$location[i]) line3 <-paste("Meters moved:", data$dist_meters[i]) line4 <-paste("Minutes since last move:",data$timespan[i]) line5 <-paste("Photo caption:", data$caption[i]) line6 <-" "## Add the current row to the table. text_input <-c(text_input, line1, line2, line3, line4, line5, line6) } ## End for loop on data## Check progress.## text_inputprompt <-"Create a description of the day's events based on the information in the file."role <-"You are an expert at inferring daily activities of a person traveling from a log of their photos. The log has information for each photo including the location at which the photo was taken, the distance (in meters) moved from the previous photo location, the time (in minutes) between this and the previous photo, and the photo caption. You know how days are structured with meals and activities and you can aggregate similar events. You do not make up activities or events that are not part of the set of photos."## Do the analysis. response <-analyzeTXT(analysis_text = text_input,AI_role = role,analysis_prompt = prompt,LLM = LLM,apiKey = apiKey,connecttimeout =90)## Check on progress.## cat(response)## Save the results. file_location <-paste0("full_story.txt")write.table(response, file = file_location)