I spent a few hours this weekend tinkering with cloud-based computer vision APIs as part of a personal project to better classify my photos. I tested the Microsoft Computer Vision API and Google’s Cloud Vision API.
Both were reasonably easy to set up, although the Google one requires a bit more effort and fussing around to navigate their API control panel. Microsoft’s took probably less than 10 minutes from signup to having working code – their process is much simpler; basically register and get an API key and you’re ready. Google requires that you signup, submit payment details, download their SDK, authenticate your account through the SDK via OAuth, and then you can finally try it out.
I had somewhat lower expectations from the Microsoft one based solely on the first thing that I saw when I checked out their home page – one of the examples that is displayed by default includes errors in their OCR:
Microsoft’s API still performs fairly well, although from a quick experimentation it seems that the Google one produces more reasonable results.
Here are some examples:
Landmarks Tower of London Labels: sky outdoor tree building tall roof
Landmark: Tower of London Tower of London, Jewel House Labels: sky building landmark historic site medieval architecture
Google’s tags are a bit more specific than Microsoft’s, but there’s some overlap. Google correctly identifies it as the Tower of London, but incorrectly decides it is Jewel House (it is the White Tower).
Labels: water outdoor sky building river bridge
Landmark: Tower Bridge Labels: bridge reflection body of water waterway landmark
Google correctly flags this as the Tower Bridge but almost amazingly (considering how iconic it is), Microsoft does not. Perhaps the colours or darkness are causing issues here. However, the tagging in both is pretty good.
Labels: tree outdoor sky building government building tower
Landmark: St. Paul's Cathedral Labels: sky landmark tree urban area woody plant
Again, Google is correctly able to identify the landmark while Microsoft falls short.
Labels: outdoor water sky night
Labels: night reflection landmark cityscape waterway
Neither of them pick up that this is Big Ben and/or Westminster. The tags are pretty good although I feel Google has a slight advantage for calling it a cityscape.
Labels: table sky wine tree outdoor glass beverage drink alcohol
Labels: water drink beer alcoholic beverage wine glass
Both pick up on the alcohol theme, but Google correctly identifies it as beer – although it picks it as a wine glass, perhaps because it’s a slightly unusual shape for a beer glass. Microsoft’s tags however are much more complete.
Labels: manhole cover
Labels: circle manhole manhole cover black and white stone carving
I have a lot of photos of manhole covers and I am keen to find a way to tag them automatically. Both Google & MS correctly tag this. Google has a bunch of extra detail, although the photo is not actually black and white.
Labels: sky outdoor grass mountain person standing nature posing day highland
Labels: mountainous landforms sky mountain nature cloud
Microsoft has a lot of detail here and importantly correctly identifies it as a “highland” photo. But both are pretty good.
Labels: fence tree outdoor parrot animal bird white
Labels: bird vertebrate purple flora tree
This is not a parrot, Microsoft. Vertebrate is a bit generic, although it is indeed a bird. Bit of a draw but the tags are still generally useful.
Labels: sky outdoor mountain grass nature hill field background overlooking grassy hillside cloudy clouds highland land distance
Labels: highland sky loch cloud wilderness
Well, Microsoft really throw the kitchen sink at this one, but they’re all accurate. Both correctly tag it with “highland” which is great, but bonus points to Google for “loch”.
Labels: person outdoor man standing
Labels: photograph statue monument photography religion
I was curious to see what it would think about a statue; Google’s tags are clearly more useful than Microsoft’s here.
Labels: grass outdoor sky tree building field farm old grassy pasture garden lush
Labels: grass cemetery tree wall historic site
Microsoft again throwing down as many as possible. Both pretty useful although again Google is the clear winner for picking it as a cemetery.
Labels: building outdoor tower old stone
Landmark: Broadway Tower Labels: castle building sky tower fortification
Google again nail the location and also tag it as a ‘castle’, which is certainly what I would have done. Microsoft’s are OK but again a bit too general.
Labels: blurry rain
Labels: insect bee honey bee macro photography membrane winged insect
Microsoft have no idea what is going on here. Google smashes it.
Labels: sky outdoor grass tree cloudy clouds day lush
Landmark: Queen's House Labels: cloud sky city daytime urban area
More generally correct stuff from Microsoft, but Google nail it with Queen’s House (although if it had also picked Canary Wharf I would have been doubly impressed).