In this post, inspired by the show Silicon Valley, we’ll create our very own hot dog classifier.
Table of contents
Open Table of contents
Jian Yang’s brilliant app
Recently started learning a bit more about machine learning and in this specific case deep learning. Most of the code in this text comes from the first leason of the Practical Deep Learning for Coders course from fast.ai, which is great if you’re interested.
If you’re not familiar with the HBO show Silicon Valley, this might not hit the same note. In episode four of season four, the character Jian Yang introduces his app SeeFood. It’s hilariously precise at recognizing hot dogs, and well, not hot dogs.
This, of course, is the dream, and finally, I can create my own version.
I tasked Midjourney with generating a unique hot dog image for me, ensuring it was completely new so it wouldn’t inadvertently become part of my training data. We will use this to test our model. Here’s the masterpiece Midjourney produced:
If you’re keen to give this a shot yourself, I highly recommend checking out the course. But if diving into the course isn’t your cup of tea, tackling the project in a Jupyter notebook is a great alternative. I used Google Colab for this project, which gave me access to some essential cloud GPU resources, significantly speeding up the process. Alternatively, you could also venture into Kaggle, which offers similar capabilities.
Getting some hot dogs
Now, to train our model, we’ll need a collection of hot dogs. Thankfully, there’s a convenient Python library for tapping into the DuckDuckGo search engine for image searches.
In addition to this, we’ll be leveraging various libraries from fast.ai, including Fastcore, which we’re importing to add some extra functionalities not present in standard Python. I’ll keep the code explanation brief, as the main attraction here isn’t the code itself.
We start by defining a function that searches for a given term and retrieves, by default, 45 images. This function will yield a list of URLs pointing to the images.
from duckduckgo_search import DDGS
from fastcore.all import *
def search_images(term, max_images=45):
print(f"Searching for '{term}'")
results = L(DDGS().images(
keywords=term,
max_results=max_images
)).itemgot('image')
return results
Let’s put the code to the test by searching for a single hot dog image.
urls = search_images('hot dogs', max_images=1)
urls[0]
Searching for 'hot dogs'
'https://www.charbroil.com.au/Images/Recipes/Main/Hot-Dog-II.jpg'
Next, we’ll download and display the image.
At this point, we’ll also import the fast.ai vision library, that will enable us to do all kinds of fun stuff later on. The fast.ai library is a tool to add high level functionality on top of PyTorch. PyTorch is a very popular deep learning library but it is also very low level and supposedly quite complicated to use. Thankfully, fast.ai simplifies these complexities, making advanced deep learning tasks more accessible.
from fastdownload import download_url
dest = 'hotdog.jpg'
download_url(urls[0], dest, show_progress=False)
from fastai.vision.all import *
im = Image.open(dest)
im.to_thumb(256,256)
If you attempt this on your own, keep in mind that the image you retrieve may differ from mine.
Next, to effectively train our model, it’s crucial not only to know what constitutes a hot dog but also to understand what doesn’t. Therefore, let’s broaden our search to include general food items.
download_url(search_images('food photo', max_images=1)[0], 'food.jpg', show_progress=False)
Image.open('food.jpg').to_thumb(256,256)
Searching for 'food photo'
Now, we’re getting into the heart of Jian Yang’s app inspiration by dividing our image downloads into two distinct folders: hot dog
and not hot dog
, staying true to the original concept.
From the snippets of code, you’ll notice we’re resizing the images to either a maximum height or width of 400 pixels.
To ensure our model accurately discerns hot dogs from other foods, I opted for items that could be mistaken for hot dogs. Searches were conducted for food photo
, food turkish photo
, food german photo
, and food taco photo
to challenge the model’s discernment.
Recognizing the diversity in hot dogs themselves, I expanded our hot dog dataset with searches like hot dog photo
, hot dog french photo
, hot dog vegan photo
and hot dog chicago photo
.
Each search aims to fetch 45 URLs, amounting to 180 images of non-hot dog foods and another 180 of bona fide hot dogs. Surprisingly, this quantity is more than sufficient to commence our model training.
searches = 'food','hot dog'
path = Path('hotdog_or_not')
from time import sleep
for o in searches:
if o == 'food':
sub_path = 'not hot dog'
dest = (path/sub_path)
dest.mkdir(exist_ok=True, parents=True)
download_images(dest, urls=search_images(f'{o} photo'))
sleep(10) # Pause between searches to avoid over-loading server
download_images(dest, urls=search_images(f'{o} turkish photo'))
sleep(10)
download_images(dest, urls=search_images(f'{o} german photo'))
sleep(10)
download_images(dest, urls=search_images(f'{o} taco photo'))
sleep(10)
resize_images(path/sub_path, max_size=400, dest=path/sub_path)
elif o == 'hot dog':
dest = (path/o)
dest.mkdir(exist_ok=True, parents=True)
download_images(dest, urls=search_images(f'{o} photo'))
sleep(10) # Pause between searches to avoid over-loading server
download_images(dest, urls=search_images(f'{o} french photo'))
sleep(10)
download_images(dest, urls=search_images(f'{o} vegan photo'))
sleep(10)
download_images(dest, urls=search_images(f'{o} chicago photo'))
sleep(10)
resize_images(path/o, max_size=400, dest=path/o)
Searching for 'food photo'
Searching for 'food turkish photo'
Searching for 'food german photo'
Searching for 'food taco photo'
Searching for 'hot dog photo'
Searching for 'hot dog french photo'
Searching for 'hot dog vegan photo'
Searching for 'hot dog chicago photo'
Some images might have failed, so lets remove those and list how many they were.
failed = verify_images(get_image_files(path))
failed.map(Path.unlink)
len(failed)
2
Learning
To train our model, we’ll employ a Dataloader, a component that organizes the structure of our dataset. In this case, we’re utilizing a DataBlock.
First we define the inputs and outputs. Our inputs are images, ImageBlock
, and our ouputs are categories, CategoryBlock
. The categories being hot dog
and not hot dogs
.
We split the data so 20% of the data is validation data. And we will use the parent folder (hot dog
and not hot dogs
) names as y-labels.
We will also re-size all our images to 192x192 and squish them. Apparantly this is not a problem for training.
Finally, we’ll generate a small sample from our dataset to verify its quality and ensure everything looks as expected.
dls = DataBlock(
blocks=(ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(valid_pct=0.2, seed=42),
get_y=parent_label,
item_tfms=[Resize(192, method='squish')]
).dataloaders(path, bs=32)
dls.show_batch(max_n=6)
It’s now time to train our model, and this is where using a service like Google Colab can be incredibly beneficial. In my tests, training the model on a CPU took several minutes, whereas leveraging a GPU reduced this time to just a few seconds.
For our computer vision model, we’ll employ ResNet18, combined with a fine-tuning method from the fast.ai library.
learn = vision_learner(dls, resnet18, metrics=error_rate)
learn.fine_tune(3)
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 0.688460 | 0.195614 | 0.078212 | 00:04 |
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 0.226918 | 0.136474 | 0.044693 | 00:05 |
1 | 0.158818 | 0.155105 | 0.039106 | 00:04 |
2 | 0.094985 | 0.134931 | 0.033520 | 00:04 |
I’m still in the process of understanding the nuances of training loss and validation loss, so I can’t delve into those details just yet. More learning to do.
Testing the model
Now that our model is trained and ready to go, let’s put it to the test with our hot dog image. Here’s a reminder of what it looks like:
is_hotdog,_,probs = learn.predict(PILImage.create('hotdog-test-sm.jpg'))
print(f"This is a: {is_hotdog}.")
print(f"Probability it's a hotdog: {probs[0]:.4f}")
This is a: hot dog.
Probability it's a hotdog: 1.0000
Fantastic! The model successfully recognized the hot dog and, judging by the probability, it’s incredibly confident in its classification.
Now, for the ultimate test: I sourced a taco image from Midjourney. Let’s see how our model fares with this one.
is_hotdog,_,probs = learn.predict(PILImage.create('taco-test-sm.jpg'))
print(f"This is a: {is_hotdog}.")
print(f"Probability it's a hotdog: {probs[0]:.4f}")
This is a: not hot dog.
Probability it's a hotdog: 0.0625
Great news! The model discerned that the taco wasn’t a hot dog, demonstrating confidence in its decision.
Now, let’s up the ante. I tasked Midjourney with creating an image of a taco that includes a sausage.
is_hotdog,_,probs = learn.predict(PILImage.create('taco-sausage-test-sm.jpg'))
print(f"This is a: {is_hotdog}.")
print(f"Probability it's a hotdog: {probs[0]:.4f}")
This is a: not hot dog.
Probability it's a hotdog: 0.0038
Great success! Intriguingly, the model was even more certain that a taco with a sausage wasn’t a hot dog compared to the plain taco.
This model should not work as well if we give it some random image. It should only do a good job at categorising food. I actually tried giving it an image of a bicycle, suprisingly it was very certain that it wasn’t a hot dog. So I might be missing something here.
Admittedly, this has been a somewhat whimsical journey, and there’s much I’ve yet to unravel. However, the goal was to dive in and have fun with it—and on that front, it’s a mission accomplished!