Visual search tools are building a bridge

Retailers and brands continuously think about ways to improve the shopping experience and eliminate all possible friction before actual purchase.
With visual search tools, they have a new instrument at their disposal. Similar to voice search, it eases and quickens the search process.
However, as visual search tools align the unique characteristics of the physical world with the frictionless character of the digital world, they have more to offer than simply a new easy and quick way of shopping.


  • When people undertake a visual search, they look for a product using an image or photo instead of keywords. This can be done in several ways. People can use their smartphone camera or upload a photo to a retailer. Instantly, the retailer will show the exact item for immediate purchase or similar items that match the characteristics of the product in the photo (e.g. color, style, collection, pattern, fitting, etc.). Currently, brands are primarily using it to improve product discovery and provide instant gratification.
  • Visual search has been here for some time. Google launched Google Image in 2011, but it was not until 2017 that visual search picked up pace. When Pinterest launched its visual discovery tool Lens, scanning images quickly became mainstream. It marked the beginning of a boom in visual search. After one year, Pinterest noted Lens had been used to complete over 600 million visual searches every month. In the past two years, important competitors have brought their visual search to the market or updated it. For example, the updated Google Lens is far more sophisticated than the earlier Google Image. Amazon recently implemented its new feature “style app”, letting users replicate their favorite fashion items by uploading a photo. Moreover, last year, it launched a partnership with Snapchat to compete with Pinterest and Instagram in the social commerce area. Snapchat users using the new visual search feature are directly linked to Amazon’s website.
  • There are strong signs visual search tools are here to stay. According to Marketwatch, 62% of Gen Z and millennials want visual search tools, more than any other new technology. And brands experimenting with visual search tools reported increased conversion rates.
  • In the past two decades, Google was nearly the only online entry point for textual product search. Last year became a landmark, with Amazon ending the long streak of Google’s online product search dominance. According to research, nowadays, more people in the U.S. use Amazon to start their product search. This marks another win in Amazon ecommerce dominance in the US. The competition between the two in the area of search is heading for unprecedented heights, as they’re also competitors in another important search domain: smart home speakers.


In the past, people went to a store and looked around to find new products. Maybe they would ask a store clerk, who would help them find the desired product. Visual and voice search coexisted peacefully in our average product discovery. Digitization has brought interesting changes to the way we search for products. The rise of e-commerce caused the unbundling of search and entering keywords (text search) became the new normal. Although most shopping is still done in physical locations, 87% of all product searches begin with digital channels. Conducting online pre-purchase research is common for almost all products, with groceries as the main exception. For around two decades, text search has dominated this online landscape. For the last couple of years, search behavior in retail has been slowly changing. In addition to text search, voice assistants have become a customer touchpoint for product search and product information. It started with mobile assistants such as Siri and now smart home speakers are starting to cleverly integrate the shopping experience into daily habits (e.g. purchase new groceries while cooking, order new socks while dressing). Compared to text search, we interact with voice search engines in a more conversational way. In this regard, they seem to function as a sort of virtual store clerk (e.g. asking if store X has a bigger size, whether there are discounts or what the openings hours are, etc.). Despite these differences between text and voice search, the underlying principle is the same: language.

visual search tools create a convenient and natural “bridge” between the finite and unique “here and now” and the frictionless and infinite digital realm.

In the past two years, visual search has slowly entered the stage. In a certain way, we could understand visual search as an extension of text search and voice search, reducing even more friction in the customer journey by reducing the time between seeing and buying. However, this could be misleading, as we would be overlooking the fundamental difference between visual search and text/voice search.

People undertaking a visual search don’t have to know a name or word, they only need to recognize or identify an object. Therefore, the different forms of search create disparate user experiences. Its nonverbal nature makes visual search more intuitive and inclusive. As we have written before, images are able to communicate rich meaning in split seconds and show us nuances that are difficult to capture in keywords. How to describe an item or a style is often difficult, causing a lot of friction in online product discovery. Visual search “returns” to immediate shopping experiences, common before the digital era, by eliminating mediation through language. However, unlike before, visual search is now linked to a digital world.

Let’s begin with the downside of this disintermediation of language. In addition to lacking the precision and clarity of language, visual search is limited to objects “here and now”. Making an abstraction of concepts such as “the weather” is difficult, if not impossible (e.g. pointing at a cloud and hoping for a weather forecast will leave us disappointed, as the search will necessarily lead to an examination of this cloud by computer vision).

However, there is also an upside to this. If we dive a little deeper into the “here and now” characteristic of visual search, we’ll find some interesting qualities. Customer journeys often begin with things that attract us for reasons we don’t know and that aren’t easily captured in phrases. In the physical realm, we are continuously inspired, attracted by certain items and moved by people we look up to. However, the brick-and-mortar of the physical world is bounded in its capacity to respond to this. Stores are limited in flexibility and personalization, and finite in their supply and offerings. Social media has been proven to copy some physical social dynamics to digital words. Not being subject to the boundaries of the physical world, social commerce on platforms such as Pinterest and Instagram seem to be the logical outcome of this. However, the online competition for our attention is fierce, especially when it comes to smaller brands. Moreover, digital worlds often exhibit herd behavior and social media influencers are criticized for a lack of authenticity. Ecommerce and social commerce seem unable to fully replicate the unique qualities of the physical world. This incongruity of both worlds has been a challenge to retailers for years. Visual search tools might have some answers to this. More than text search and voice search, visual search tools create a convenient and natural “bridge” between the finite and unique “here and now” and the frictionless and infinite digital realm. With visual search tools, the distance between physical inspiration and digital gratification becomes almost nonexistent. Every point in the physical world becomes a possible starting point for the customer journey.


  • As every object in the physical world becomes a possible customer touchpoint, smaller brands might become less dependent on advertising on social media or social commerce. Instead, they could prioritize real-world objects and develop new forms of consumer engagement.
  • In a more speculative line of thought, we may think of every physical object becoming a hyperlink to an “endless” digital world “beyond it”, with its own internal logic and organization. Theorist Benjamin Bratton already mentioned that in a digitalized world, every object becomes a possible gateway to endless information. It could relate anything to anything. For example, a “scanned” historical building could not only show relevant information, historical facts and opening hours, but also footage of friends who had visited the place or other, similarly memorable encounters somewhere else. This deep addressability of everything reorders and reconfigures the normal categorization and taxonomy of the world. Every physical object might get it owns general and – more important – personalized digital trace in time and space, which would become re-activated when you visit and scan it. Especially Google is rolling out features in line with this speculative line of thought, such as its integration of Lens in Google Maps. Maps and Lens form the solid basis for slowly developing functionalities for this futuristic AR experience.