Ancestry Data Mapping
Ancestry analysis has been one of the most trending sciences recently. In our world, knowing our genetic origins is finding a personal history. Ancestry mapping can have a deeper positive impact for society; helping us to find out estranged and distant relatives, identifying a history of genetic diseases before impact, finding our historical roots, and various other uses.
Global / United States
Client
Genealogy Mapping and Research
CLIENT
Our client is one of the most foremost genealogical research/databasing corporation in the world. They hold more than 10 billion documents pertaining to genetic, spatial, and historic data gathered from various sources. We were tasked to create an automated system that will analyze and extract phrenological data from school and college yearbooks spanning across the United States, and other two major European nations.
These yearbooks contain more than 1.2 Billion records spanning four decades worth of raw data, the data collected from these can be extremely vital for genealogical analysis and genome sequencing.
CHALLENGE
For an humongous dataset like this, we face many challenges that need to be resolved:
-
Data Quality - For this model, the data has to be meticulously extracted from varying source which differ in quality, proper image analysis has to be ensured so that four decade long data is consistent for loading into the model.
-
Data Simplification - Vast and variant data like this could lead to a lot of redundancy and bad data to be fed into the model.
-
Naming Cataloguing - Most countries usually name their progeny with variations of a set group of names, this can cause issues in mapping and recognition.
SOLUTION
Solution summary:
-
Facial data and contextual data is extracted and constructed from the expansive data provided to us from the scanned yearbooks.
-
This data is then matched with records of names and aliases create an associative model that can match people with similar facial structure and names.
Various solutions were drafted to circumvent these issues, the most efficient method has been listed below :
-
Facial data is gathered and analyzed from scanned yearbook images and files, both individual and group images are analyzed.
-
To support the extracted image data, the names pertaining to the records are also collected, grouped, and match with the individual person.
-
The clients later provide their own company rules, which is coupled with our natural language processing and facial recognition data to find genealogical and generational matches.
-
These matches are then safeguarded and retrieved by need for the clients.
RESULT
A model like this can simplify manual phrenological analysis and genealogical mapping. An expert is not needed, with just a click of a button and some predetermined rules set by the clients we can immediately generate an entire database which could be matched and mapped with existing records within the company.
The older hard/raw data then can be discarded reducing storage space. Digital data is versatile and robust, and it can support ease of access and retrieval which is important for a corporation like this.