Procons Academy

Why should Engineers and Construction Professionals Learn Data Science

Background of Data Science

Data science can be seen as the method in which a large pool of data extracts the important features and main characteristics to create something useful. A database consists of many “data points” which are key components that are found in a particular data set. Behavioral patterns can be detected and extracted by analyzing these data points which are taken at constant intervals or at different moments (depending on the study conducted). These patterns would allow forecasting of future trends and behaviors. The advantage of getting a large amount of data is that it is often possible to obtain “correlations” between various data, rather than the “trends.” It is therefore very clear that the availability of a wide range of data is necessary in order to carry out an effective data science study with complex and varied approaches.

The key areas that a data scientist should be familiarized are as follows;
  1. Coding: coding is the core tool required to conduct intensive data science research as computer models are capable of handling a vast amount of data. 
  2. Statistics: statistics is the actual science behind the data science as data consists of numbers and handling numbers requires an expert knowledge of both mathematics as well as statistics.
  3. Business Knowledge: the business knowledge component can be considered as a soft factor which effect the field of data science. Having sound knowledge on both coding as well as statistics, but not being familiar with business concepts can be considered a huge handicap for any data scientist as it helps in delivering a meaningful data analysis.

The data from a data science project will be used for 3 major aspects;

1.  Data analysis: this refers to the most conventional method of using data. The basis used here is the analysis of past events to understand the present condition. (Ex: Consider a chart similar to the one indicated which shows the sales of a certain company for 16 months)

2.  Predictive analytics: this component refers to the scenario in which the past data collected will be used in predicting the future. (Ex: the red line refers to the estimates on how sales will perform through a span of 20 months). This approach of course will have many variations and hence the best option should be considered for each scenario.

3.  Build a data-based product: Finally after considering all such components a product will be introduced as a final outcome of the data science research.

Both machine learning and deep learning are useful components those are widely used in the field of data sciences. Machine Learning is the general name for all the methods by which a computer fine-tunes a statistical model and finds the best fit for the used data set. Deep learning is nothing but one specific machine learning method. This method has become popular in recent times and is being widely used for image and voice recognition projects. The process used in this method is the filtration of input values through multiple layers by creating automatic correlations. This process is quite similar to the process in which the human brain functions.

  1. When businesses employ a Data Analyst, they typically look for someone to work on analysis projects, optimization and reporting. This person will help the organization understand its client base and flag potential challenges and opportunities for the future.
  2. When an organization is looking for a Data Scientist, it generally needs someone on board who is fantastic at predictive analytics and who has machine learning expertise and related advanced methodologies. This knowledge can be useful for risk management, recommendation system development, resource utilization and much more

Data Science in the field of Civil Engineering

Data science can be used widely in civil engineering practices , construction management  and project management , etc. As an example Concrete’s Compressive strength defines Concrete’s consistency. In general, a regular crushing test on a concrete cylinder decides this. It allows engineers to create small concrete cylinders with various raw material combinations, and test these cylinders with a shift in each raw material for differences in strength. The suggested waiting time for cylinder testing is 28 days to ensure that the tests are accurate. This consumes a considerable time and requires additional labour to prepare different prototypes and testing of  them. Also, this method is prone to human error and one small mistake can cause the waiting time to drastically increase.

One way to reduce the processing time and the number of combinations to try is to use digital simulations, where the user gives the computer details about the information they possess, and the machine tries various combinations to determine the compressive strength. This method reduces the amount of variations and   the time needed for experimentation. But the user will have to know the relationships between all the raw materials to build these software, and how one material affects the other.

An example process of developing such models can be seen as follows,

1.  The first step is to understand the data and gain insights from the data before doing any modelling. Checking the correlations between the input features, will give an idea about how each variables are affecting all other variables. 

2.  After preparing data, we can fit different models on the data and compare their performance to choose the algorithm with good performance. 

3.  The first approach considered would be the simple linear regression as it is the most widely used model. Here the computer algorithm tries to form a linear relationship between the input features and the target variable.

If the predicted values and the target values are equal, then the points on the scatter plot will lie on the straight line. As we can see here in an example where the linear regression method is implemented, none of the model predicts the Compressive Strength correctly

4.  The second approach is a Decision Tree Algorithm which represents the data with a tree-like structure, where each node represents a decision taken on a feature.

Considering the same example it can be observed that the Decision Tree Regressor has improved the performance in a significant amount. This can be observed in the plot as well as more points are closer to the line.

5.  The final approach implemented would be the Random Forest Regressor as it trains randomly initialized trees with random subsets of data sampled from the pool of data, this will make the model more robust.

According to the above illustration, this approach can be considered as a better option for data analysis of cement quality in civil engineering.

Data science can  also be successfully applied in  MEP engineering. Buildings use multiple devices for functions such as space heating, air conditioning, water heating, and delivery of electrical power, ventilation, and protection against fire. When equipped with sensors, all the equipment and components which make up these systems are potential sources of data. Data also has exciting research and development applications: it can be used to create performance enhanced building technologies.

A successful use that data science provides is the establishment of a “digital twin” of an actual building. A digital twin may be viewed as a more sophisticated variant of a BIM model. Typically, when a project is completed, a BIM model is finalized with “as-built” information. Nevertheless, by taking measurements from the building a digital twin is continuously updating itself. This provides a continuous data stream, which can be analyzed to help control the house.

The data obtained from a building can be used to predict the impact of improvements including energy retrofits. In this way, in a virtual model, the building owner can test several potential project options and analyze their effect before making an investment decision.


It can be seen from the above discussed background information, the presence of data science can benefit the field of civil engineering in numerous ways. The proper usage and implementation of the data which are available within each project will help engineers in decision making and proper construction sequencing to avoid cost overruns and time delays.


Compiled by Eng. Nirmala Rathnayaka of Pro Consultancy International (Pvt. ) Ltd.

Photo Source: GlobeSt


Leave a Reply

Your email address will not be published. Required fields are marked *