Automated Vulnerability Prediction in Software Systems and Lightweight Identification of Design Patterns in Source Code

Automated Vulnerability Prediction in Software Systems and Lightweight Identification of Design Patterns in Source Code
Author: Jeffy Jahfar Poozhithara
Publisher:
Total Pages: 91
Release: 2021
Genre:
ISBN:

Software development companies put a heavy investment in fixing security vulnerabilities in their products after code development. This demands an automated mechanism to identify security vulnerabilities during and after software development. One approach is to include possible solutions like security design patterns during design. This reduces system-wide architectural changes required and enables efficient documentation and maintenance of the software systems. Further, identifying which design patterns already exist in source code can help maintenance engineers determine if new requirements can be satisfied. The current techniques for design pattern identification require either manually labeling training datasets or manually specifying rules or queries for each pattern. As part of this research, we took a two-pronged approach: 1. Pre-implementation: predict vulnerabilities before any source code is written, to increase awareness of possible risks while developing the system. 2. Post-implementation: check the source code to identify any missing security patterns, based on the identified vulnerabilities. For the first approach, we created a Keyword Extraction-based Vulnerability Identification System (KEVIS) that uses natural language processing techniques to extract keywords and n-grams from software documentation to predict security vulnerabilities in software systems. We analyzed the correlation of certain keywords and n-grams with the occurrence of various security vulnerabilities as well as the correlation between different vulnerabilities. Additionally, we analyzed the performance of classification algorithms (Logistic Regression, Support Vector Machines, K-Nearest Neighbors, Multi-level perception, and Random Forest) in the prediction. To enable the analysis, we also created a dataset by mapping over 200,000 vulnerability reports on the CVE website with technical/functional documentation of 3602 products. The preliminary analysis shows that the performance of KEVIS is comparable or better than the prediction using source code as well as other static analysis methods. For the second approach, we introduced PatternScout, a technique for automatically generating SPARQL queries by parsing UML diagrams of design patterns, ensuring that pattern characteristics are matched. We discuss key concepts and the design of PatternScout. Our results indicate that PatternScout can automatically generate queries for the three types of design patterns (i.e., creational, behavioral, structural), with accuracy that is comparable, or perform better than, existing techniques. Due to the difference in concepts used for both approaches and ease of explanation, the background, literature review, method, results, and discussions corresponding to each approach is discussed separately in their own sections (Approach 1 - Automated Vulnerability Prediction in Software Systems, and Approach 2 - Lightweight Identification of Design Patterns in Source Code, respectively).

A Software Vulnerability Prediction Model Using Traceable Code Patterns and Software Metrics

A Software Vulnerability Prediction Model Using Traceable Code Patterns and Software Metrics
Author:
Publisher:
Total Pages:
Release: 2018
Genre:
ISBN:

Context : Software security is an important aspect of ensuring software quality. The goal of this study is to help developers evaluate software security at the early stage of development using traceable patterns and software metrics. The concept of traceable patterns is similar to design patterns, but they can be automatically recognized and extracted from source code. If these patterns can better predict vulnerable code compared to the traditional software metrics, they can be used in developing a vulnerability prediction model to classify code as vulnerable or not. By analyzing and comparing the performance of traceable patterns with metrics, we propose a vulnerability prediction model. Objective: This study explores the performance of code patterns in vulnerability prediction and compares them with traditional software metrics. We have used the findings to build an effective vulnerability prediction model. Method : We designed and conducted experiments on the security vulnerabilities reported for Apache Tomcat (Releases 6, 7 and 8), Apache CXF and three stand-alone Java web applications of Stanford Securibench. We used machine learning and statistical techniques for predicting vulnerabilities of the systems using traceable patterns and metrics as features. Result : We found that patterns have a lower false negative rate and higher recall in detecting vulnerable code than the traditional software metrics. We also found a set of patterns and metrics that shows higher recall in vulnerability prediction. Conclusion : Based on the results of the experiments, we proposed a prediction model using patterns and metrics to better predict vulnerable code with higher recall rate. We evaluated the model for the systems under study. We also evaluated their performance in the cross-dataset validation.

A Software Vulnerability Prediction Model Using Traceable Code Patterns and Software Metrics

A Software Vulnerability Prediction Model Using Traceable Code Patterns and Software Metrics
Author: Kazi Zakia Sultana
Publisher:
Total Pages: 112
Release: 2018
Genre:
ISBN:

Context: Software security is an important aspect of ensuring software quality. The goal of this study is to help developers evaluate software security at the early stage of development using traceable patterns and software metrics. The concept of traceable patterns is similar to design patterns, but they can be automatically recognized and extracted from source code. If these patterns can better predict vulnerable code compared to the traditional software metrics, they can be used in developing a vulnerability prediction model to classify code as vulnerable or not. By analyzing and comparing the performance of traceable patterns with metrics, we propose a vulnerability prediction model. Objective: This study explores the performance of code patterns in vulnerability prediction and compares them with traditional software metrics. We have used the findings to build an effective vulnerability prediction model. Method: We designed and conducted experiments on the security vulnerabilities reported for Apache Tomcat (Releases 6, 7 and 8), Apache CXF and three stand-alone Java web applications of Stanford Securibench. We used machine learning and statistical techniques for predicting vulnerabilities of the systems using traceable patterns and metrics as features. Result: We found that patterns have a lower false negative rate and higher recall in detecting vulnerable code than the traditional software metrics. We also found a set of patterns and metrics that shows higher recall in vulnerability prediction. Conclusion: Based on the results of the experiments, we proposed a prediction model using patterns and metrics to better predict vulnerable code with higher recall rate. We evaluated the model for the systems under study. We also evaluated their performance in the cross-dataset validation.

Towards the Automation of Vulnerability Detection in Source Code

Towards the Automation of Vulnerability Detection in Source Code
Author: Hai Zhou Ling
Publisher:
Total Pages: 0
Release: 2009
Genre:
ISBN:

Software vulnerability detection, which involves security property specification and verification, is essential in assuring the software security. However, the process of vulnerability detection is labor-intensive, time-consuming and error-prone if done manually. In this thesis, we present a hybrid approach, which utilizes the power of static and dynamic analysis for performing vulnerability detection in a systematic way. The key contributions of this thesis are threefold. first, a vulnerability detection framework, which supports security property specification, potential vulnerability detection, and dynamic verification, is proposed. Second, an investigation of test data generation for dynamic verification is conducted. Third, the concept of reducing security property verification to reachability is introduced.

Predicting Attack-prone Components with Source Code Static Analyzers

Predicting Attack-prone Components with Source Code Static Analyzers
Author:
Publisher:
Total Pages:
Release: 2004
Genre:
ISBN:

No single vulnerability detection technique can identify all vulnerabilities in a software system. However, the vulnerabilities that are identified from a detection technique may be predictive of the residuals. We focus on creating and evaluating statistical models that predict the components that contain the highest risk residual vulnerabilities. The cost to find and fix faults grows with time in the software life cycle (SLC). A challenge with our statistical models is to make the predictions available early in the SLC to afford for cost-effective fortifications. Source code static analyzers (SCSA) are available during coding phase and are also capable of detecting code-level vulnerabilities. We use the code-level vulnerabilities identified by these tools to predict the presence of additional coding vulnerabilities and vulnerabilities associated with the design and operation of the software. The goal of this research is to reduce vulnerabilities from escaping into the field by incorporating source code static analysis warnings into statistical models that predict which components are most susceptible to attack. The independent variable for our statistical model is the count of security-related source SCSA warnings. We also include the following metrics as independent variables in our models to determine if additional metrics are required to increase the accuracy of the model: non-security SCSA warnings, code churn and size, the count of faults found manually during development, and the measure of coupling between components. The dependent variable is the count of vulnerabilities reported by testing and those found in the field. We evaluated our model on three commercial telecommunications software systems. Two case studies were performed at an anonymous vendor and the third case study was performed at Cisco Systems. Each system is a different technology and consists of over one million source lines of C/C++ code. The results show positive and statistically signific.

Automatic Detection of Safety and Security Vulnerabilities in Open Source Software

Automatic Detection of Safety and Security Vulnerabilities in Open Source Software
Author: Syrine Tlili
Publisher:
Total Pages: 0
Release: 2009
Genre:
ISBN:

Growing software quality requirements have raised the stakes on software safety and security. Building secure software focuses on techniques and methodologies of design and implementation in order to avoid exploitable vulnerabilities. Unfortunately, coding errors have become common with the inexorable growth tendency of software size and complexity. According to the US National Institute of Standards and Technology (NIST), these coding errors lead to vulnerabilities that cost the US economy $60 billion each year. Therefore, tracking security and safety errors is considered as a fundamental cornerstone to deliver software that are free from severe vulnerabilities. The main objective of this thesis is the elaboration of efficient, rigorous, and practical techniques for the safety and security evaluation of source code. To tackle safety errors related to the misuse of type and memory operations, we present a novel type and effect discipline that extends the standard C type system with safety annotations and static safety checks. We define an inter-procedural, flow-sensitive, and alias-sensitive inference algorithm that automatically propagates type annotations and applies safety checks to programs without programmers' interaction. Moreover, we present a dynamic semantics of our C core language that is compliant with the ANSI C standard. We prove the consistency of the static semantics with respect to the dynamic semantics. We show the soundness of our static analysis in detecting our targeted set of safety errors. To tackle system-specific security properties, we present a security verification framework that combines static analysis and model-checking. We base our approach on the GCC compiler and its GIMPLE representation of source code to extract model-checkable abstractions of programs. For the verification process, we use an off-the-shelf pushdown system model-checker, and turn it into a fully-fledged security verification framework. We also allow programmers to define a wide range of security properties using an automata-based specification approach. To demonstrate the efficiency and the scalability of our approach, we conduct extensive experiments and case studies on large scale open-source software to verify their compliance with a representative set of the CERT standard secure coding rules.

A Deep Learning Approach to Predict Software Bugs Using Micro Patterns and Software Metrics

A Deep Learning Approach to Predict Software Bugs Using Micro Patterns and Software Metrics
Author: Marcus Brumfield
Publisher:
Total Pages: 47
Release: 2020
Genre:
ISBN:

Software bugs prediction is one of the most active research areas in the software engineering community. The process of testing and debugging code proves to be costly during the software development life cycle. Software metrics measure the quality of source code to identify software bugs and vulnerabilities. Traceable code patterns are able to de- scribe code at a finer granularity level to measure quality. Micro patterns will be used in this research to mechanically describe java code at the class level. Machine learning has also been introduced for bug prediction to localize source code for testing and debugging. Deep Learning is a branch of Machine Learning that is relatively new. This research looks to improve the prediction of software bugs by utilizing micro patterns with deep learning techniques. Software bug prediction at a finer granularity level will enable developers to localize code to test and debug during the development process.

Automatic Detection of Security Vulnerabilities in Source Code

Automatic Detection of Security Vulnerabilities in Source Code
Author: Xiaochun Yang
Publisher:
Total Pages: 252
Release: 2010
Genre:
ISBN:

Growing security requirements for systems and applications have raised the stakes on software security verification techniques. Static analysis has been widely used to detect vulnerabilities at compile time. It takes advantage of the relevant information generated by the compiler and scales well to large code base. However, it is limited to check low-level security properties that syntactically match concrete program actions. Recently, model-checking is settling and showing great promise in the arena of software verification. Nevertheless, it suffers from abstraction issues for deriving a model of the program that can be model-checked. In this thesis, we present our security verification approach that brings into a synergy static analysis and model-checking. This synergy leverages the advantages of both techniques. We use the static analysis to automatically generate a concise abstraction of the program. On the other-hand, the model-checking provides the capability and flexibility of specifying and verifying a wide range of properties, and we also benefit from the exhaustive program analysis provided by model-checking.

Security Vulnerabilities

Security Vulnerabilities
Author: Sanaz Rahimi
Publisher:
Total Pages: 256
Release: 2013
Genre:
ISBN:

Security vulnerabilities pose a real threat to computing systems ranging from personal computers to mobile devices and critical systems. Quantification and prediction of vulnerabilities allows us to compare systems, orient and plan to mitigate vulnerabilities, and design reliable and secure systems. In this dissertation, the software Vulnerability Discovery Models (VDMs) are studied and it is illustrated that they cannot provide accurate vulnerability prediction even with large amount of historical vulnerability data. We then propose and study a scheme that incorporates software properties such as compliance with secure coding rules and code complexity measures to provide vulnerability prediction without reliance on historical data. The new scheme is evaluated by testing it on real-world software applications and comparing it with existing VDMs. The new scheme applies to C/C++ applications. In addition, the study is extended by developing and evaluating a scheme to measure and quantify the impact of protocol vulnerabilities. In this framework, simulation is used to analyze various protocol configurations and provide recommendations for secure configurations of Virtual Private Networks (VPNs). The evaluation results illustrate that the new schemes can accurately quantify software and protocol vulnerabilities.