
Imagine playing a new, slightly changed version of the game GeogeserYou have a picture of an average American home, perhaps two stories tall in a cul-de-sac with a front lawn and an American flag flying proudly out front. But there’s nothing particularly distinctive about this house, nothing to tell you what state it’s in or where the owners are from.
You have two tools: your brain, and 44,416 low-resolution, bird’s-eye view photos of random locations across the United States and their corresponding location data. Can you find the correct house by matching it to the aerial image?
I certainly couldn’t, but a new machine learning model possibly can. Software, created by researchers China Petroleum University (East China)A roadside image – of a house or commercial building or anything else that can be photographed from the street – searches a database of remote sensing photos with corresponding location information to match an aerial image in the database. While other systems can do the same, this one is pocket-sized and highly accurate compared to others.
At its best (when faced with an image with a 180 degree field of view), it is 97 percent successful in the first step of narrowing down space. This is within or within two percentage points better than all other models available for comparison. Even in less-than-ideal conditions, it outperforms many competitors. When pinpointing a precise location, it is correct 82 percent of the time, which is within three points of other models.
But this model is new in terms of its speed and memory saving. According to researchers, it is at least twice as fast as similar ones and uses less than a third of the memory required by them. This combination makes it valuable for applications in navigation systems and the defense industry.
“We train the AI to ignore superficial differences in perspective and focus on extracting similar ‘key landmarks’ from both views, converting them into a simple, shared language,” explains. peng renWho develops machine learning and signal processing algorithms at China University of Petroleum (East China).
The software relies on a method called deep cross-view hashing. Instead of trying to compare every pixel of a Street View image to every image in a huge bird’s-eye-view database, this method relies on hashing, which means transforming a collection of data – in this case, street-level and aerial photos – into a string of numbers unique to the data.
To do this, the China University of Petroleum Research Group employs a type of deep learning model called a vision transformer that divides images into smaller units and finds patterns between the pieces. The model might find something in a photo that it has been trained to identify as a tall building or a circular fountain or a roundabout, and then encode its findings into a string of numbers. ChatGPT is based on a similar architecture, but finds patterns in text rather than images. (“T” in “GPT” stands for “transformer”.)
It is said that the number representing each photo is like a fingerprint. Hong LeeWho studies computer vision at the Australian National University. The number code captures unique characteristics from each image which allows the geolocation process to quickly narrow down potential matches.
In the new system, codes associated with a given ground-level photo are compared to all aerial images in the database (for testing, the team used satellite images of the United States and Australia), yielding the five closest candidates for aerial matches. Data representing the geography of closest matches is averaged using a technique that overweights locations close to each other to reduce the influence of outliers, and removes the approximate location of the Street View image.
The new mechanism for geolocation was published last month IEEE Transactions on Geoscience and Remote Sensing,
Fast and memory efficient
“Although not an entirely new paradigm,” says Lee, this paper “represents a clear advancement within the field.” Because this problem has been solved before, some experts, such as computer scientists at Washington University in St. Louis nathan jacobsNot that excited. “I don’t think it’s a particularly groundbreaking paper,” he says.
But Lee disagrees with Jacobs – he believes the approach is innovative in its use of hashing to render images faster and more memory efficient than traditional techniques. It uses only 35 megabytes, while the next smallest model tested by Ren’s team requires 104 megabytes, almost three times as much space.
Researchers claim that this method is more than twice as fast as the next fastest method. When matching street-level images with a dataset of United States aerial photography, the time to match the runner-up was about 0.005 seconds – Petroleum Group was able to find a location in about 0.0013 seconds, almost four times faster.
“As a result, our method is more efficient than traditional image geolocalization techniques,” Ren says, and Li confirms that these claims are reliable. “Hashing is a well-established route to speed and density, and the reported results are consistent with theoretical expectations,” says Lee.
While these efficiencies seem promising, more work is needed to ensure this method will work at scale, Lee says. The group did not fully study realistic challenges, such as seasonal variations or clouds blocking the image, that could affect the robustness of geolocation matching. Ren says that going forward, this limitation can be overcome by presenting images from more distributed locations.
Still, experts say long-term applications (beyond super-advanced geogazers) are worth considering now.
There are some trivial uses for an efficient image geolocation, says Jacobs, like automatically geotagging old family photos. But on the more serious side, navigation systems can also take advantage of this kind of geolocation method. If GPS fails in a self-driving car, another way to quickly and accurately find location could be useful, Jacobs says. Lee also suggests it could play a role in emergency response within the next five years.
It may also have applications in defense systems. finderA 2011 project by the Office of the Director of National Intelligence aimed to help intelligence analysts learn as much as possible about photographs without metadata using context data from sources including overhead images, a goal that can be accomplished with a model similar to this new geolocation method.
Jacobs puts the defense application into context: If a government agency sent a photo of a terrorist training camp without metadata, how could the site be geolocated quickly and efficiently? Deep cross-view hashing may be of some help.
From articles on your site
Related articles on the web

