The underlying mechanism is this: by comparing images that overlap each other, an algorithm can derive a three dimensional representation of features that it finds in multiple images, and it can also determine the relative positions of the cameras (from where the pictures were taken) in space. The class of algorithms doing this is called Simultaneous Localization and Mapping or SLAM.
When adding measured real-world GPS coordinates into this calculation, the 3D results can be localized to positions in the real world.
Using a drone is a cheap way of getting all the required images (compared to an airplane or satellite.) The drone executes a mission that gives it a flight path on which it takes pictures, storing GPS coordinates foe each picture.

The pictures and their GPS coordinates are fed into a processing software. The software calculates an initial set of tie points (“sparse cloud”) while aligning the picture positions. Then other work products as chosen by the user can be calculated: dense point cloud (more detailed), mesh (3D model), orthomosaic (flat view from above), or a digital elevation model.
Some tools provide editing options to clean up a point cloud manually and to add geometry. This can help if the data will be shown to an audience that might be distracted by “debris” in the scene.
Some tools also provide measuring of distances, areas, and of volumes. This is powerful functionality for many use cases. However, measurements are only meaningful if the scale of the dataset is accurate – which, by default, it usually is not.
Most tools provide ways to add annotations, so that business-relevant information can be captured in the scene.
Some cloud-based processing tools simply accept the images from the drone and do their thing. Some local tools offer the user detailed control over parameters, which can help with getting the best possible results.
Most drones use GPS systems that deliver only consumer-grade accuracy: within a few meters. Some drones use more sophisticated RTK GPS, which allows centimeter-level accuracy for recording locations by relying on additional data from terrestrial GPS base stations.
Ground Control Points as Real World References
One can take additional steps to achieve more accurate positioning. This is useful if the data set is meant to be integrated with existing maps or if it is meant to measure distances, areas or volumes.
Ground control points provide accurate positioning. They are physical markers (sheets of vinyl, or marks painted on the ground) that are put in place before a mapping flight. Their GPS location is accurately determined by the drone operator or by a surveyor with precision equipment on site. (Or, giving up accuracy, by determining coordinates in Google Maps.)

When the drone takes its pictures, the ground control points are visible in them. The coordinates for the Ground Control Points are imported into the processing software, along with the images. The operator marks the ground control points in a number of images to train the algorithm to recognize them. Ground control points add labor and expense to mapping, but if real-world integration and real-world scale is needed, they do the job.
Relative and Absolute Accuracy
The accuracy of drone mapping data is limited by GPS accuracy, which can vary by several meters.
The calculations that produce drone mapping products tend to average out large inaccuracies between details in the dataset. This accuracy of positioning within the data set is called relative accuracy. It is needed to make accurate size- and volume measurements in the point cloud.
The other type of accuracy is absolute accuracy. It describes the degree of alignment of the data with the real world around it. Absolute accuracy is called for when it counts to know where exactly something is in relation to the surroundings outside the data set, or when the actual GPS position of a feature is important.
