Security Insights

Is my personal information used to train AI?


Recently, generative AI has become a hot topic, and many people are using tools like ChatGPT for assistance. 

But how can AI respond to our questions so effectively? 

The key lies in continuous data learning, where AI analyzes and processes vast amounts of data to become smarter.

Interestingly, the data used to train AI can include our personal information. 


However, collecting the necessary personal data for training often requires large amounts of information, 

which can conflict with the principle of data minimization, a core tenet of privacy protection.


Moreover, when collecting personal data, there should be a clear and specific purpose. 

But given the nature of AI development, it's challenging to predict the exact purpose in advance, 

making it difficult to adhere to the principle of purpose limitation.


Lastly, personal data should be destroyed once the purpose for which it was collected has been fulfilled. However, 

AI systems tend to retain data for the maximum duration as long as they provide services, 

which makes it difficult to comply with the principle of data usage limitation.


Thus, the conflict between privacy protection principles and AI has highlighted the need for mutual understanding and strengthened privacy safeguards.

In response to the widespread adoption of AI, new regulations were introduced last March to establish citizens' rights in automated decision-making and to better protect personal data. 

These regulations include stricter qualification requirements for Chief Privacy Officers (CPOs) in organizations and public institutions that handle large amounts of sensitive personal information.

Let's take a closer look at the scope of personal information, how AI might use your data, and the recent amendments to the Information Protection Act!




Definition of Personal Information

'Personal information' refers to any information about a living individual that falls into one of the following categories:

  1. Information that can identify an individual, such as a name, social security number, or video footage.
  2. Information that may not identify an individual on its own but can be easily combined with other data to identify a person. (In this case, factors such as the availability of other information and the time, cost, and technology required to identify the individual must be reasonably considered.)
  3. Information that has been pseudonymized, meaning it has been processed in such a way that it cannot identify a specific individual without additional information used to restore it to its original state.
  • "Pseudonymization" refers to the processing of personal data in such a way that it cannot identify a specific individual without additional information, such as by deleting parts of the data or replacing them entirely.



How Does AI Use Personal Information?

You might have noticed that when using apps, someone seems to be collecting your search history or usage patterns. For example, Amazon analyzes data to identify products you are more likely to purchase and presents them to you. Similarly, Netflix recommends content based on data analysis of user behavior. Google uses data to track trends and deliver targeted ads. For IT companies, data has become a crucial resource for maintaining a competitive edge.

In addition, GPT is OpenAI's large language model that generates text based on user prompts. It has been trained on publicly available data, such as news articles, blog posts, and Wikipedia entries, and uses this knowledge to generate responses. Sometimes, it even reuses the user's prompt in this process. According to OpenAI's privacy policy, “We collect user information (prompts, uploaded files, feedback) as needed and allow GPT to learn from it.”

This aspect of OpenAI's privacy policy has sparked criticism. The major concern is that OpenAI collects information without user consent and shares it with other companies. Moreover, users are often unaware of what information is being collected and which companies have access to it.



Cases of AI Leaking Corporate Information

In March, an engineer from Samsung’s Device Solutions (DS) division, responsible for semiconductor operations, input critical source code into a ChatGPT prompt. This code was reportedly leaked externally. As a result, concerns over information leaks have led to distrust in GPT, prompting companies sensitive to information security to restrict the use of GPT by their employees.

Several U.S. companies, including Amazon, JPMorgan Chase, and Bank of America, have banned their employees from using ChatGPT for work purposes. In South Korea, companies like Samsung and SK Hynix have also started to limit the use of ChatGPT by their employees.



If you need a solution to securely protect your company's data, partner with MarkAny!



Social Media Links

Solutions

Explore

MarkAny


MarkAny Co., Ltd.13th Floor, Ssangrim Building, 286 Toegye-ro, Jung-gu, Seoul, South Korea  

(+82) 02-2262-5222ㅣ contact@markany.com  

Business Registration Number : 101-81-47345 

Copyright © 2023. MarkAny. All Rights Reserved.

Solutions

Explore

Social Media Links

MarkAny Co., Ltd.  13th Floor, Ssangrim Building, 286 Toegye-ro, Jung-gu, Seoul, South Korea ㅣ (+82) 02-2262-5222 ㅣ contact@markany.com ㅣ 
Business Registration Number : 101-81-47345 

Copyright © 2023. MarkAny. All Rights Reserved.