Background of Data Science
Data science can be seen as the method in which a large pool of data extracts the important features and main characteristics to create something useful. A database consists of many “data points” which are key components that are found in a particular data set. Behavioral patterns can be detected and extracted by analyzing these data points which are taken at constant intervals or at different moments (depending on the study conducted). These patterns would allow forecasting of future trends and behaviors. The advantage of getting a large amount of data is that it is often possible to obtain “correlations” between various data, rather than the “trends.” It is therefore very clear that the availability of a wide range of data is necessary in order to carry out an effective data science study with complex and varied approaches.The key areas that a data scientist should be familiarized are as follows;
The data from a data science project will be used for 3 major aspects;
1. Data analysis: this refers to the most conventional method of using data. The basis used here is the analysis of past events to understand the present condition. (Ex: Consider a chart similar to the one indicated which shows the sales of a certain company for 16 months)
2. Predictive analytics: this component refers to the scenario in which the past data collected will be used in predicting the future. (Ex: the red line refers to the estimates on how sales will perform through a span of 20 months). This approach of course will have many variations and hence the best option should be considered for each scenario.
3. Build a data-based product: Finally after considering all such components a product will be introduced as a final outcome of the data science research.
Both machine learning and deep learning are useful components those are widely used in the field of data sciences. Machine Learning is the general name for all the methods by which a computer fine-tunes a statistical model and finds the best fit for the used data set. Deep learning is nothing but one specific machine learning method. This method has become popular in recent times and is being widely used for image and voice recognition projects. The process used in this method is the filtration of input values through multiple layers by creating automatic correlations. This process is quite similar to the process in which the human brain functions.
Data Science in the field of Civil Engineering
Data science can be used widely in civil engineering practices , construction management and project management , etc. As an example Concrete’s Compressive strength defines Concrete’s consistency. In general, a regular crushing test on a concrete cylinder decides this. It allows engineers to create small concrete cylinders with various raw material combinations, and test these cylinders with a shift in each raw material for differences in strength. The suggested waiting time for cylinder testing is 28 days to ensure that the tests are accurate. This consumes a considerable time and requires additional labour to prepare different prototypes and testing of them. Also, this method is prone to human error and one small mistake can cause the waiting time to drastically increase.
One way to reduce the processing time and the number of combinations to try is to use digital simulations, where the user gives the computer details about the information they possess, and the machine tries various combinations to determine the compressive strength. This method reduces the amount of variations and the time needed for experimentation. But the user will have to know the relationships between all the raw materials to build these software, and how one material affects the other.
An example process of developing such models can be seen as follows,
1. The first step is to understand the data and gain insights from the data before doing any modelling. Checking the correlations between the input features, will give an idea about how each variables are affecting all other variables.
2. After preparing data, we can fit different models on the data and compare their performance to choose the algorithm with good performance.
3. The first approach considered would be the simple linear regression as it is the most widely used model. Here the computer algorithm tries to form a linear relationship between the input features and the target variable.
If the predicted values and the target values are equal, then the points on the scatter plot will lie on the straight line. As we can see here in an example where the linear regression method is implemented, none of the model predicts the Compressive Strength correctly
4. The second approach is a Decision Tree Algorithm which represents the data with a tree-like structure, where each node represents a decision taken on a feature.
Considering the same example it can be observed that the Decision Tree Regressor has improved the performance in a significant amount. This can be observed in the plot as well as more points are closer to the line.
5. The final approach implemented would be the Random Forest Regressor as it trains randomly initialized trees with random subsets of data sampled from the pool of data, this will make the model more robust.
According to the above illustration, this approach can be considered as a better option for data analysis of cement quality in civil engineering.
Data science can also be successfully applied in MEP engineering. Buildings use multiple devices for functions such as space heating, air conditioning, water heating, and delivery of electrical power, ventilation, and protection against fire. When equipped with sensors, all the equipment and components which make up these systems are potential sources of data. Data also has exciting research and development applications: it can be used to create performance enhanced building technologies.
A successful use that data science provides is the establishment of a “digital twin” of an actual building. A digital twin may be viewed as a more sophisticated variant of a BIM model. Typically, when a project is completed, a BIM model is finalized with “as-built” information. Nevertheless, by taking measurements from the building a digital twin is continuously updating itself. This provides a continuous data stream, which can be analyzed to help control the house.
The data obtained from a building can be used to predict the impact of improvements including energy retrofits. In this way, in a virtual model, the building owner can test several potential project options and analyze their effect before making an investment decision.
It can be seen from the above discussed background information, the presence of data science can benefit the field of civil engineering in numerous ways. The proper usage and implementation of the data which are available within each project will help engineers in decision making and proper construction sequencing to avoid cost overruns and time delays.
Compiled by Eng. Nirmala Rathnayaka of Pro Consultancy International (Pvt. ) Ltd.
Photo Source: GlobeSt