What are the difficulties of conducting a digital forensic investigation for cloud?

Cloud Forensics Challenges

The emanation of cloud computing in recent years turned out to be major technological advancement in the way Information technology (IT) services are deployed and provisioned. Cloud computing radically changing the way how IT services are created, managed and accessed. In the year 1966 Canadian engineer Douglass Parkhill published his book “The Challenge of Computer Utility” in which he described the idea of public computing (Mohammed & Mohamad, 2016).

Several independent studies done by corporate organizations predicted a sharp increase in the use of cloud computing. A study done by mark research media found that the global cloud computing market is expected to increase by 30% by the year 2020. The growth is increased because cloud service providers (CSPs) offers on-demand services on the pay per use basis (R & Ghuli, 2015).

On the other side, many organizations have reported the vulnerabilities in cloud services which leads to cyber-attacks and threats. As per the research done by the European Network Information Security Agency (ENISA) thirty-five risk categories were identified (Surya Negara & Andryani, 2014). Other non-profit organizations like Cloud Security Alliance (CSA) conducted the survey in the year 2018 among 700 IT professionals which state that around 33% professionals think public and hybrid cloud are vulnerable to security risks (Alliance, 2019). The increasing security concern is caused due to easy user account registration and unlimited computing power.

The attackers could use the services to perform the malicious task and close the account forever (Singh, 2014). It’s is not possible to stop all the attacks but with the help of cloud forensics, the attackers could be traced back and brought to justice. With the current methodologies and digital forensic process, it becomes difficult for the investigators to collect the evidence due to distributed processing, multi-tenancy, high virtualization, and dynamic environment of the cloud. The cloud infrastructure was not designed keeping in mind about the digital forensics or other forensics processes which make it difficult for the investigators to investigate the case in the cloud environment (Alqahtany, Clarke, Furnell, & Reich, 2015).

Researchers and some of the organizations like CSA, National Institute of Standard and Technology (NIST) have published similar research papers related to cloud forensics (CSA, 2103), (Bohn, Messina, Liu, Tong & Mao, 2011). Such as technical challenges and solution in cloud forensics (Pichan, Lazarescu & Soh, 2015). The research is in-depth discussing on cloud computing, cloud forensics, digital forensics framework models, issues in each stage and the solutions but all the solutions were hypothetical. Some of the areas like virtualization, live forensics, and solution based on new technology like blockchain were missing.

A study done by (Ruan, Carthy, Kechadi & Crosbie, 2011) on cloud forensics an overview, highlights the issues faced in cloud computing and how digital forensics in the cloud is different from the legacy on-premises server investigation. Authors have done in-depth research on cloud base crimes; challenges faced in cloud forensics and took the example of different CSPs and what all steps are involved to investigate the case in cloud. The research doesn’t explain about the possible solutions to overcome all the challenges discussed by the authors.

The technology wave is encouraging the companies to use the cloud computing and so the demand of cloud forensics increases and due to this, the urgent need is arising to define new methods and tools that can be used to carry out investigations in the cloud environment. The motivation of the research is to understand the limitations and opportunities available to overcome the current challenges in the cloud forensics which will open a new door for the forensic investigators, users, and organizations (Marturana, Me, & Tacconi, 2012).

Cloud Computing Overview

Many definitions have been proposed by different researchers to explain the meaning of cloud computing and the nature of cloud services. One of the definitions proposed by L. Wang, G. Laszewsk states cloud is a group of scalable networks enabled services with quality of service and usually personalized a low-cost computing platform access on demand in a simple way (Shen, Li, Wu, Liu & Wen, 2015). In the year 1961 at MIT centennial talk, John McCarthy said that “The computing utility could become the basis of a new industry”, which implied the concept of cloud computing. Cloud computing was first introduced by Eric Schmidt in his talk on Search Engine Strategies Conferences in the year 2006 (Kapil, Tyagi, Kumar & Tamta, 2017). Cloud services could be utilized by the cloud user without installation, and the files can be accessed from any computer all over the globe with internet access.

With the growing demand of high computing power, elasticity, manageability, and low cost, all these features led three major organizations, Amazon, Google, and Microsoft to took one step further towards cloud computing and led to three major cloud computing styles (Sharma, 2017).All this happened between the year 2006-2008.

Amazon’s cloud computing is based on server virtualization technology. Amazon introduced object storage service (S3), Xen-based Elastic Compute Cloud™ (EC2), and structure data storage service (SimpleDB) during the 2006 – 2007. Cheaper and on-demand AWS becomes the pioneer of Infrastructure as a Service (IaaS) provider ("Amazon Virtual Private Cloud", 2019).

Google’s platform is based on the technique-specific sandbox. Google published several research papers and introduced Platform as a Service (PaaS) cloud computing and this was called Google App Engine™ (GAE), was released to the public as a service in 2008 (Missbach et al., 2016).

Microsoft Azure™ was released in Oct. 2008, which uses Windows Azure Hypervisor techniques as the underlying cloud infrastructure. Azure also offers services including SQL service, Incognito search, application hosting, and so on (Kaur, 2016).

Types of Cloud Deployment Models

There are four types of cloud deployment models. Users and organizations could select as per the requirement and the cost. Currently public and hybrid are more in demand due to cost and services provided.

Cloud Deployment models Citation (Zhang & Huang, 2014)

Private: The cloud infrastructure is provided for use by a single organization. This type of deployment model could be owned and managed by the organization, third party, or a combination of both.The server could be on-premises or off-premises (Sharma, 2017).

Public: This type of cloud infrastructure is provisioned for open use by the general public. This type of cloud deployment model could be owned, managed, and operated by the organization, individual, or both. The servers are on cloud service provider premises (Sehgal & Bhatt, 2018).

Hybrid: Hybrid cloud is a combination of private and public cloud which provides more computing power, flexibility, and reliability. The servers lie in on-premises and off-premises (Yousif, 2016).

Community: Community cloud platform is used by a specific set of users or community that have a concern regarding mission, security, requirements, policy, and compliance. This type of model could be owned by individual, community, organization, third party or combination of all. The servers could be on-premises or off-premises (Yousif, 2016).

Types of Cloud Service Model

Cloud Service Models citation (Zhang & Huang, 2014)

The above figures provide information on the different types of cloud service models and access level user or organizations get while using these models.

Infrastructure as a Service (IaaS): IaaS service model provides individual and the organizations the freedom to use services such as storage, processing, network, provisioning, and other fundamental computing resources to deploy and the software such as operating system and applications. The user only has control over the hosted application and operating system but not over the underlying computing resources (Zhang & Huang, 2014).

Platform as a Service (PaaS): PaaS provides the flexibility to the organizations to use the cloud service to deploy their applications.Organizations only have control over the hosted applications not over the underlying infrastructure(Zhang & Huang, 2014).

Software as Service (SaaS): SaaS provides the user and the organizations the flexibility to use software and the applications such as Gmail and Outlook, which are pre-installed on the cloud platform from any part of the world over the internet. The organizations or the users can’t manage the application and the underlying cloud platform (Kaur, 2016).

Cloud framework based on NIST Citation (Zhang, Cheng & Boutaba, 2010)

Cloud Computing Services

There are numerous factors which provide organizations and individual an edge to choose cloud services. Few of them are discussed below.

Elasticity: The ability which provides organization power to scale up or scale down the computing power as per the usage and demand. Saves the cost of a hardware upgrade. For example, the organizations having online clothing business demand more computing resources during the festival or sales period. it would not be realistic to invest in hardware for a short period of time (Zhang & Huang, 2014).

Pay-As-You-Grow: Public CSPs like Amazon, and Microsoft allow companies to avoid large upfront investment as needed companies can rent the resources and pay as they use (Zhang & Huang, 2014).

On-Demand Service: Cloud user can access any cloud services on demand such as Dropbox, PowerApps, Skype, and at any point of time the services could be customized or drop as per the user and organization requirements (Freet, Agrawal, John & Walker, 2015).

Multi-Tenancy: In cloud infrastructure all the cloud resources are hosted in a way that cloud user can access the required service from any part of the world over the internet (Freet, Agrawal, John & Walker, 2015).

Scalability: Cloud services require features such as flexibility, scalability and reliability. The computing power and resources provided by the CSPs could be scaled across different issues like software configuration, location of the services, and hardware performance. The CSPs provide 24/7 support service to support any of the customer’s issue (Abdulla & K, 2015).

Cloud Forensics Overview

Forensics is a technique or test used in the detection of crime and present the guilty in the court. The forensics could be divided into subcategories like digital forensics, network forensics, device forensics, and cloud forensics. The research focus on cloud forensics issues and available solutions (Hale, 2013). Cloud forensic can be defined as the application of digital forensics in cloud infrastructure. Cloud forensics is a cross-discipline between cloud computing and digital forensics. The cloud forensics consist of three dimensions technical, organizational, and legal (Ruan, Carthy, Kechadi & Crosbie, 2011).

Technical Dimensions: The technical dimensions consist of a set of procedures and tools which could help to collect forensics data in a better way from the cloud. Some of them are forensics data collection, collection of data in a live-forensics environment, evidence segregation, and investigating the virtual environment and collecting data.

Organizational Dimensions: During cloud forensics there are always two parties involved customer and the CSP. If CSP outsources the services, then the area of research widens. Similar study is carried out by authors (Ruan, Carthy, Kechadi & Crosbie, 2011).

Legal Dimensions: In any successful or unsuccessful cloud forensics investigation the organizations and the individuals come across legal issues like multi-jurisdiction and multi-tenancy, service level agreement (SLA).  Regulation and agreement must be implemented in terms of the cloud environment to secure the forensics investigation. Similar studies are carried out by (Ruan, Carthy, Kechadi & Crosbie, 2011) and (Liles, Rogers & Hoebich, 2009).

Digital Forensics Process

Digital forensics process is initiated after the incident or attack happens in the digital environment it can be termed as post-incident activity in the cloud environment the forensics process could be divided into three subcategories. Figure 3 represents the generic digital forensics process (Yan, 2011).

Digital Forensics Process Citation (Pichan, Lazarescu & Soh, 2015)

Digital Forensics: Digital forensics is a branch of forensics science that focuses on the investigation and recovery of evidence and artifacts found in digital devices such as a computer, laptop and, mobile which were involved in digital crime. The digital evidence is collected, analysed and presented in the court (Quick & Choo, 2018).

Network Forensics: Network forensics is a sub-branch of digital forensics which monitors and analyses the network devices such as routers, hub, and switches. With the help of network forensics, the investigators can monitor and analyse the computer network traffic with the help of tools such as Fiddler and Wireshark and trace back the attackers. The collected logs help the investigators to prove the attacker guilty(Yan, 2011).

Client Forensics: Client forensics is also a sub-branch of digital forensics which focuses on the analyses and acquisition of digital evidence from devices such as mobile and tablets. The forensics process is carried out from the site of the client (Quick & Choo, 2018).

Digital Forensics Models

Currently there are different forensics model, but the first model came in the year 1984 introduced by Politt named as computer forensic investigative process (Agarwal & Kothari, 2015).

Computer Forensic Investigative Process (Agarwal & Kothari, 2015)

The model consists of four stages, acquisition, identification, evaluation and admission same is represented in figure 4. Later in year 2001 DFRWS investigation model was introduced which have some advancement as compared to first model. The DFRWS consists of six stages (Agarwal & Kothari, 2015).

DFRWS Investigation Model (Agarwal & Kothari, 2015)

In the year 2002 Mark Reith, Clint Carr and Gregg Gunsch got inspired by the DFRWS model and proposed the Abstract Digital Forensic Model (ADFM) (Kyei, Zavarsky, Lindskog & Ruhl, 2013).

Abstract Digital Forensic Model (ADFM) (Kyei, Zavarsky, Lindskog & Ruhl, 2013)

The ADFM model comprise of several stages. Analysis and examination are interconnected in this model to provide better results and return of the evidence was introduced for the first time.

Issues in Cloud Forensics

 Dependency on CSPs: Due to distributed nature of the cloud the forensics experts and the organizations depend on the CSPs to perform even a small cloud forensics investigation (Alqahtany, Clarke, Furnell & Reich, 2015). Collection or chain of custody of data Collection of logs. Some of the forensics investigations require SLA to get the desired results all these factors depend on the CSPs (Alex & Kishore, 2017). Similar problems are mention by the authors (Ruan, Carthy, Kechadi & Baggili, 2013), (Martini & Choo, 2014) in their research papers.

Especially in the SaaS environment, it’s not possible to perform any kind of forensics investigations unless the CSPs cooperate with the investigators, consumers and other law authorities. As only CSPs have access to the logs and data required to perform the cloud forensics investigation (Freet, Agrawal, John & Walker, 2015). The dependency will be going to remain the problem unless the service providers start providing unified tools or collaborate with the investigation authorities and cloud users. (Alex & Kishore, 2016).

 Multi Tenancy: Cloud computing is known for its services like hosting multiple VMs, multiple instances accessing the same hardware and software resources from different locations and devices. This nature adds complicity in the cloud forensics as the VMs work in their own sandboxes and not aware of the neighbor resources. A physical seizure of the device is not possible as it might holding other customers data (Pichan, Lazarescu & Soh, 2018). During the collection of cloud logs, there is the possibility the logs might co-mingle with other cloud user using the same resources. (Brown, Glisson, Andel & Choo, 2018) reported section of data is also a challenge in the cloud environment due to distributed nature and resources being utilized by multiple people.

 An alleged user can claim in the court the logs collected from the cloud environment are of the different user due to multi-tenancy feature of the cloud. This becomes a hassle for the investigators to provide enough documentation and supporting logs to support the case. A similar discussion is done by the (Zawoad, Dutta & Hasan, 2016), (Freet, Agrawal, John & Walker, 2015), (Alex & Kishore, 2016) in their research papers. During the collection of data in a multi-tenant environment, the investigators must make sure the data collected is of a specific user and must maintain the integrity other user’s data during the process of cloud forensics investigation (Ruan, Carthy, Kechadi & Baggili, 2013).

 Volatile Data: In the cloud infrastructure, the nature of data is highly volatile. If the VM is stopped, suspended and, deleted there is the possibility of losing the volatile data like system logs, network traffic, Kernel objects, and memory dump. Different researchers (Qi et al., 2017), (Alqahtany, Clarke, Furnell & Reich, 2015), (Pichan, Lazarescu & Soh, 2018), (Alex & Kishore, 2017), (Brown, Glisson, Andel & Choo, 2018) have highlighted similar definition in their studies.

Volatile memory can’t be recovered using static analysis, like offline hard disk analysis as in the cloud infrastructure it’s not possible to confiscate the physical memory. Some of the researchers have highlighted the volatile data could be faked using rootkit access tool, which adds complexity in performing the task (Zawoad, Dutta & Hasan, 2016). Another challenge is preserving the acquired volatile data. Currently, there are not enough tools and techniques available to preserve the collected data (Hay & Nance, 2008).

 Trust with CSP: It’s noticed in different research papers most of the solution provided to resolve the cloud issues such as data acquisition, dependency on the CSPs, and volatility of the data; are hypothetical, and the consumers and investigators lack trust in the credibility of the solutions as it is based on the CSP (Alex & Kishore, 2016). Even while the logs are collected and, presented the organizations, law authorities, and the investigators lack the trust in the CSPs. Similar studies done by (Manoj & Bhaskari, 2016), (Pichan, Lazarescu & Soh, 2018), (Choo, Esposito & Castiglione, 2017), (Zawoad, Dutta & Hasan, 2016), (Abdulla & K, 2015), (Hegarty, Merabti, Shi & Askwith, 2013), (Birk & Wegener, 2011) mention the issue of trust with the CSPs and how it affects the cloud forensics investigation and the available solutions.

 Chain of Custody: The chain of custody is the difficult task in the cloud due to multiple factors like how the evidence was collected, analyzed, organized, stored, and presented, along with who had the access to the logs (Yan, 2011). The investigators need to document the entire process of how the data is acquired (Qi et al., 2017). If the forensics experts miss any of the jurisdiction guidelines the collected data will not be fit to present in the court (Hale, 2013). The collection of the data in cloud also depends on the cloud models; IaaS customers can easily access the data, but this is not the situation with the SaaS and PaaS. In the other two platforms, the consumers and the forensics experts depend on the CSPs for acquiring the data. The similar studies are done by (Alqahtany, Clarke, Furnell & Reich, 2015), (Roussev, Ahmed, Barreto, McCulley & Shanmughan, 2016), (Alex & Kishore, 2017), (Brown, Glisson, Andel & Choo, 2018), (Choo, Esposito & Castiglione, 2017), (Chauhan & Bansal, 2017).

 Data Acquisition: Data acquisition is one of the difficult tasks in cloud computing as compared to legacy on-premises server due to multiple reasons such as the location of the data, dependency on the CSPs, Jurisdiction issues, multi-tenancy, and shared resources (Battistoni, Di Pietro & Lombardi, 2016). The forensics investigators find difficulty in identifying the correct logs as one VM could be shared by multiple users from a different location. Hybrid and public cloud operate across jurisdiction make it more challenging to collect the pieces of evidence. The same is mentioned by (Alqahtany, Clarke, Furnell & Reich, 2015), (Marturana, Me & Tacconi, 2012), (Alex & Kishore, 2017), (Yan, 2011) in the research papers.

 Logging Issue: Analysing the logs is one of the processes in the cloud forensics. The logs could be applications, system or network logs. Collection of these logs is challenging due to factors such as accessibility, volatility, dependency on CSPs, multi-tenant, and multi-jurisdiction. Every CSPs use their own logging framework for the collection of logs which produces challenges in time lining the events (Zhong, Xiang, Yu, Qi & Guan, 2013). The logs collected in the cloud are scattered throughout the network, as a result, becomes challenging for the cloud investigators. Current cloud forensics process lacks proper logging architectures, tools, techniques, and models to carry out the investigation. Similar studies are done by (Pichan, Lazarescu & Soh, 2018), (Alex & Kishore, 2017), (Yan, 2011), (Martini & Choo, 2014), (Zawoad, Dutta & Hasan, 2016), (Alex & Kishore, 2016).

 Evidence Reconstruction: Reconstructing the evidence or the crime is quite difficult in the cloud infrastructure. Due to lack of evidence and shared resources. Currently, they're not specific tools and techniques available to perform the task. With the help of evidence reconstruction. A similar study is mentioned by (Qi et al., 2017), (Alqahtany, Clarke, Furnell & Reich, 2015), (Alex & Kishore, 2017), (Alex & Kishore, 2016), (Marty, 2011) in their research papers.

 Jurisdiction: It's quite common in the cloud environment that data is stored outside the customer's jurisdiction area. CSPs need not inform the clients about the details of their data stores. This leads to jurisdiction restrictions during a cloud forensics investigation. Therefore, depending on the location, different laws apply which could impact the results of the cloud forensics investigation. Authors have mentioned that keeping the data onshore helps in the forensics investigation and the data is bound to country’s law system, whereas the forensics experts and the clients don’t have much control over offshore data stores (Manoj & Bhaskari, 2016). The study done by (Martini & Choo, 2013) explains about issues of working with another jurisdiction and collecting the evidence. One can’t be sure if the other jurisdiction has followed the required procedures to collect the data which is in accordance with the law, legal principals and rule of the evidence. As per (Ruan, Carthy, Kechadi & Baggili, 2013) 75% of researchers and IT experts think jurisdiction is one of the major issues in cloud forensics.

In some situations, forensics experts could use the username and password to access the cloud server to determine if there any other user data stored which could be used for investigation purpose. To perform this step jurisdiction permission is required. In the case of cross-jurisdiction, one might give permission and another jurisdiction might not accept such act (Alex & Kishore, 2017), (Brown, Glisson, Andel & Choo, 2018). Even the network devices which could be used to trace back the source of the attack are not of much help in cloud environment due to rapid changes of resources like location, port no and the IP address, geographical location, and the law. Presentation of the data is easier in digital forensics as compare to cloud forensics due to distributed nature of the resources which are accessed simultaneously by hundreds of users will pose a serious challenge (Alex & Kishore, 2016).

Unknown Physical location: The location of the virtual machines, servers, and other devices are unknown to the consumers in the cloud environment. It becomes difficult to locate and identify the required evidence to perform the forensics investigation. The cloud data could be stored at a different location in a different jurisdiction, data could be split into the different drive at different locations and data centers. Same has been discussed by (Marturana, Me & Tacconi, 2012), (Alex & Kishore, 2017), (Ruan, Carthy, Kechadi & Baggili, 2013), (Aminnezhad, Dehghantanha, Taufik Abdullah & Damshenas, 2013), (Choo, Esposito & Castiglione, 2017), (Chauhan & Bansal, 2017), (Battistoni, Di Pietro & Lombardi, 2016). The solutions to overcome the issue of unknown location is discussed in the solution section.

What are the challenges of cloud forensics?

Logging Issue: Analysing the logs is one of the processes in the cloud forensics. The logs could be applications, system or network logs. Collection of these logs is challenging due to factors such as accessibility, volatility, dependency on CSPs, multi-tenant, and multi-jurisdiction.

How cloud forensics is a challenge in digital forensics?

The Big Data Digital Forensics issue for the Cloud is difficult. One of them is the need to identify which physical devices have been compromised. Data are distributed in the Cloud, so the customer or digital forensics practitioner cannot have full access control like the traditional investigation does.

What are biggest challenges to conducting digital forensic investigations?

The volume challenge As the number of devices and volume of data grows, the field of digital forensics becomes faced with the volume problem. Now, more than ever, investigators can accumulate unprecedented volumes of data. However, automation tools to store and analyze such data are lagging.

What are some issues that should be considered in acquiring digital evidence from the cloud?

On the other hand, some of the concerns regarding cloud computing include digital forensics, information security, data jurisdiction, privacy and national law.