Will Big Data change the future of pig genetics?

Big Data is the next big thing in the world of animal husbandry. The concepts have a lot of potential, although it is sometimes a touch unclear how exactly pig producers in the long run can benefit. This article sums up the opportunities in the field of breeding.

Recently, a conference was organised in the Netherlands for an audience of animal geneticists, which had the title ‘Saying Goodbye to the Genomics era and welcoming the new era of Big Data’. The conference focused on new technologies from precision farming to Big Data analytics.

Two different fast evolving areas that are going to revolutionise the way we are going to improve the genetics of farm animal species. Is it just an interesting title to attract attention or are things really going to change because of these new techniques and approaches? Is Big Data going to change the way genetic improvement programmes work and, consequently, the role of genetics?

Big Data analytics and machine learning

Big Data analytics is the discipline of exploring and analysing large data sets of various types and sources, structured and unstructured, to reveal patterns and correlations that help to translate these data into information and predictions. Unravelling patterns that can be used as information for future management and business decisions.

Machine Learning is a type of artificial intelligence that can be developed through algorithms by exposing them to data. Data are used to train and test the algorithms. Machine Learning can be used to develop smart algorithms for forecasting, detecting anomalies, etcetera. Well-known applications are fraud detection, spam-filters in e-mails. During the last decade, however, increasingly more applications in agriculture are being developed.

Precision farming and PLF

Precision farming is about to dramatically increase the amount of information. This can be done via smart sensors that measure climate and environmental conditions. In crop farming, it is about combining Global Positioning System (GPS) and Geographical Information System (GIS) systems and adding information that can be used for precise disease treatments, fertilisation or predictions of harvest quantities and quality.

If applied widely, information can be shared within the region and it can create knowledge to increase the efficiency of production and sustainability. Also in animal production, collecting daily measurements in a more structural way provides new ways to optimise production chains. Although many devices have the potential to collect measurements every minute or second, most of this information is not yet used to its full potential. Precision farming is about measuring, storing and utilising this information from a variety of sources.

In livestock production, precision farming can develop for example into precision feeding. Additional information can be gathered from monitoring animal’s positions and activity (GPS) continuously or applying biosensors that can measure intrinsic body parameters. Also, sensors can be used to monitor the conditions in the barn in detail, like temperature, humidity, etcetera. Together with more detailed knowledge about animals (genotype) and feed, knowledge can be developed to get more accurate feeding strategies. Precision feeding can be focused as well on groups or individual animals. Also, other options like precision treatments (for example veterinary) and precision management (for example slaughter management) are logical to be added once detailed information is analysed and available to optimise farm efficiency. Combining all sources of information will lead to new insights. Once this process has started, it can be expanded to explore ways that lead to more added value or increased quality.

Hackathons

Big Data analytics has a multidisciplinary nature. Once it generates more data and insights, it brings together various disciplines. Real new insights and added value are expected from collaborating disciplines that explore multidisciplinary issues. Examples in livestock include specific behaviour in feeding that is linked to certain genotypes and disease treatments that seem to interact with feeding regimes. Disciplines have developed their own ways of tackling and optimising things, but in most cases, they know little or nothing about what is happening in other disciplines.

Typical potential insights are unexploited because it is hard to get expertise combined from different disciplines. Here one will benefit from techniques like Big Data analytics that quickly reveal patterns that with multidisciplinary teams generate knowledge and insights at the crosslinks of areas. Nowadays ‘hackathons’ are a new phenomenon driven by Big Data developments, where teams compete to solve a certain problem. It brings data analysts and experts together from different disciplines and provides them different data sources to explore solutions to multidisciplinary problems.

Data storage capacity and Big Data analytics

Measuring is not always the most difficult part, but what is important is that the measurements are stored properly and processed to make them available for future analysis. Here Big Data science can bring solutions for storage and analysis. Nowadays structured and unstructured data can be easily stored in the cloud.

Cloud infrastructure can be offered where storage and computing resources can be easily scaled. It facilitates bringing various streams of information together and running analytics. It has a low threshold to get started, but of course, costs have to be monitored. For precision farming it helps to store and analyse information from a variety of sources in a variety of formats and measuring frequencies.

Forecasting models and machine learning

An important pillar of Big Data analytics is the forecasting part. Detecting patterns in historic data that can be used to predict future values. ‘Random Forest’ and ‘Support Vector Machine Learning’ are examples of the widely applied forecasting methodologies in Big Data.

Genetic programmes are also focused on making predictions of genetic values. They use different techniques like Best Linear Unbiased Predictions (BLUP). Specific tools added for animal genetics are about dealing with animal relationships. Because of some similarities in objectives and models, an ongoing cross-pollination is to be expected in the near future.

An example of Big Data in practice – applying ear tags with finisher pigs yields a fountain of new data. Photo: Bert Jansen

Big Data analytics vs genomic BLUP methods

Genomic selection is applied widely the last decade to predict genetic values in animal breeding. It is based on genotyping reference animals with very accurately known breeding values for a large group of genetic markers, Single Nucleotide Polymorphisms (SNPs), which are evenly spread across the chromosomes. In most species SNP-panels of 60,000 SNPs to 600,000 SNPs are commonly applied.

Finally, the link between phenotype and genome gets estimated from these reference animals and gets used to predicting the genomic breeding values with high accuracy for young animals, which do not have a lot of information yet.

In various simulation studies, conventional methods like genomic BLUP (GBLUP), have been compared with techniques like Random Forest and Support Vector Machine Learning to predict genetic values. Interestingly both methods get very close in accuracy. The traditional GBLUP methods are in some cases much faster, but it seems that different techniques developed in totally different environments, can be used to predict genetic values based on relations with genetic markers.

It is expected that in the coming years hybrid models will be developed using the best from both sides. This is expected to boost the use of various data sources and add to the accuracy of genetic breeding values. This will logically lead to higher genetic progress in breeding programmes and more detailed matching between breeding goals and real market requirements.

From predictive to prescriptive

The more detailed knowledge that becomes available during the animals productive life, the easier it gets to directly use this information in the predictive analysis. In fact, one can predict with high accuracy what happens tomorrow. And when certain patterns of parameters are detected, they can be linked with a high likelihood to existing problems, diseases or peak productions. And once these things are known in advance it is a logical step to anticipate on it. When subclinical diseases can be detected early, treatments can ensure that the disease never turns up. The most interesting would be to know the complete genetic background of an individual by sequencing the genome and use this information to detect patterns that in other cases were linked to specific situations.

Precision treatment and precision feeding could, in that case, be used to get the best out of an individual. It would prevent the animal from getting out of balance and developing deficiencies or diseases. Also, the treatment and feeding would be completely focused on animal’s genetic and environmental background.

Also genotyping a small group of animals regularly for a limited number of genetic differences (SNPs) could reveal that the new generation of animals has a different genetic makeup that corresponds to a higher appetite or increased resistance to battle certain strains of diarrhoea. These patterns of SNPs can be linked to improved performance, but could also be used to adapt treatments for certain diseases.

Whether it is more detailed knowledge of genetics, climate or environmental conditions. Directly incorporating this knowledge into a forecast model to get alerts on performance drops or upcoming diseases will help to act more adequately. And directly linking any prescriptive information and measuring its effectiveness will make room for continuous improvement. It will be the base for a more sustainable production and will help farmers to get the most out of their genetics, feed and farm situation.

Data storage space needed for genomics the coming years

Genomics is Big Data. If one area really creates a lot of data and grows in an exponential way it is genomics. The last decade the focus was on Single Nucleotide Polymorphisms. These are single base-pair differences in DNA that appear in a population. Sometimes they are associated with phenotypic differences and sometimes they are not. In a very fast pace, genomic labs moved from genotyping 60,000 to 1 billion SNPs to nowadays completely sequencing the genome. This, of course, goes fastest in human genomics.

In 2025, it is expected that between 200 million and 2 billion people could have their genome sequenced, according to E. Hayden in a publication in Nature in 2015. This would generate a need for data storage between 2-40 exabytes (1 exabytes = 1018 bytes) and would put human genomics above YouTube and Twitter for annual growth in storage capacity. Of course developments in animal genomics are expected to follow the developments in human genomics but at a slower pace.

What does the future hold?

Both in genomics and Big Data analytics, we will see tremendous developments in the coming decade. The cost of genotyping will go down, so the SNP-panels will grow drastically. Even full genome sequencing seems to be within reach for low prices, so that will make very detailed genetic information available.

At the same time, the cost of analysis goes down drastically and fast computers get available via the cloud to everybody that needs it. In the meantime, cloud technologies will make data storage capacity available quickly and tailored for all type of data streams. More data, faster analysis, improved forecasting techniques… Altogether this will boost accuracy and will bring genetic programs closer to the market.

One can imagine that there will be enormous developments possible in efficiency and quality. At the end, it will lead to more transparency in genetics and other disciplines. This opens the door for further detailed customer-oriented adjustments like precision feeding, precision veterinary treatments, and tailor-made genetics. More data and more detailed analytics will provide us with information for more sustainability and efficiency in future production.

Join 18,000+ subscribers

Subscribe to our newsletter to stay updated about all the need-to-know content in the pigsector, three times a week.

"*" indicates required fields

Benny van Haandel Director/owner at E-barn Solutions; pig breeding expert

More about

Raf Beeren (55), has been managing director of the newly created division Hendrix Genetics Swine since early 2025. In the past he held various positions within Hendrix Genetics. Most recently he was managing director for the business unit Turkeys; prior to that he was being managing director for Hypor – the company’s pig business unit. He holds an MSc degree in Animal Science from Wageningen University & Research in the Netherlands. Photos: Jan-Willem Schouten

02-05 | Interview

Will Big Data change the future of pig genetics?

Big Data analytics and machine learning

Precision farming and PLF

Hackathons

Data storage capacity and Big Data analytics

Forecasting models and machine learning

Big Data analytics vs genomic BLUP methods

From predictive to prescriptive

Data storage space needed for genomics the coming years

What does the future hold?

Join 18,000+ subscribers

More about

Raf Beeren (Hendrix Genetics Swine): “Aiming for a top spot with pig genetics”

Farm visit: Spacious sow farm for starting couple

The Netherlands expects another drop in pig numbers in 2025

The role of genetic regulators in boosting sow fertility

Special editions

Health tool