Mark Zuckerberg’s recent announcement about Meta’s plans to develop its own artificial intelligence systems using data from Facebook and Instagram has sparked significant outcry.
Following the release of Meta Platforms Inc.’s latest financial results, Zuckerberg highlighted the company’s intention to leverage the vast amounts of user-generated content on these platforms to train a new chatbot.
Meta wants to rival ChatGPT by tapping into a dataset that surpasses the Common Crawl’s 250 billion webpages, a primary resource for training conversational AI.
Zuckerberg’s ambition to harness hundreds of billions of publicly shared images, tens of billions of videos, and the extensive volume of public text and comments presents a potential edge in developing more sophisticated AI.
The value of this data is amplified by its interactive nature, particularly the comment threads that mirror human dialogue, crucial for creating conversational agents.
However, the move to utilize such personal and interactive content raises significant privacy concerns, as it involves training AI on potentially sensitive conversations and posts shared among friends.
Moreover, the challenge of filtering toxic content from the training data cannot be understated.
The online environment, including Facebook’s comments sections, is rife with harmful content, including personal attacks, and discriminatory speech.
As such, training AI on unfiltered datasets could perpetuate these issues, making the development of a non-toxic, conversational AI a daunting task.
Meta’s historical content, which includes politically charged and misleading information, further complicates the dataset’s quality.
The decision to train AI on this vast, uncurated dataset contrasts sharply with more cautious approaches in AI development, such as Apple’s careful progression towards relaunching Siri.
Meta’s recent stance on allowing a manipulated video of President Biden to remain on its platform is pretty much indicative of the company’s lenient content standards.
This is likely to intensify concerns over its upcoming AI’s potential to inherit and amplify the platform’s existing issues with privacy and toxicity.
Meta intends to create a new chatbot by using vast amounts of user-generated content from Facebook and Instagram, aiming to outperform existing AI like ChatGPT.
The use of personal and sensitive data from social interactions for AI training has sparked fears over the potential misuse of private information.
While the specifics of filtering strategies remain unclear, the inherent challenge of cleansing the dataset of toxicity is a significant hurdle for Meta.
Unlike Meta’s ambitious and broad data utilization, companies like Apple take a more cautious route, focusing on privacy and careful curation of training data for AI.
Meta CEO Mark Zuckerberg’s AI announcement has raised major concerns, after he said that the company had more user data than was used to train ChatGPT – and would soon be using it to train its own AI systems.
The company’s plan to use Facebook and Instagram posts and comments to train a competing chatbot raises concerns about both privacy and toxicity …
Zuckerberg announced the company’s plan after releasing the company’s latest earnings report, as Bloomberg reports.
For many people, Facebook is the internet, and the number of its users is still growing, according to Meta Platforms Inc.’s latest financial results. But Mark Zuckerberg isn’t just celebrating that continuing growth.
He wants to take advantage of it by using data from Facebook and Instagram to create powerful, general-purpose artificial intelligence.
[Zuckerberg said] “The next key part of our playbook is learning from unique data and feedback loops in our products… On Facebook and Instagram, there are hundreds of billions of publicly shared images and tens of billions of public videos, which we estimate is greater than the Common Crawl dataset and people share large numbers of public text posts in comments across our services as well.”Common Crawl refers to a huge archive of 250 billion webpages, representing the bulk of the text used to train ChatGPT. By calling on an even larger dataset, Meta could be in a position to build a smarter chatbot.
As Bloomberg notes, it’s not just the sheer volume of data that might give Meta an advantage – it’s the fact that so much of it is interactive.
The pile of data he’s sitting on is especially valuable because so much of it comes from comment threads.
Any text that represents human dialogue is critical for training so-called conversational agents, which is why OpenAI heavily mined the internet forum Reddit Inc. to build its own popular chatbot.
First, Meta would effectively be training its AI on what may be quite personal posts, and conversations between friends in Facebook comments.
That raises major privacy alarms. Second, anyone who has ever read the comments section anywhere on the Internet knows that the percentage of toxic content is high.
Also Read: Meta Concerned About Telecommunications Bill 2023 Impact on India’s OTT Landscape
Also Read: Meta Announces New Privacy Controls for EU Users: Unlinking Facebook, Instagram, and WhatsApp
Also Read: Meta Tightens Messaging Safety for Teens on Instagram and Facebook
Highlights OPPO Reno 12 to feature MediaTek Dimensity 8250 chipset Triple camera setup with dual…
Highlights iQOO Pad 2 features a 12.1-inch 2.8K LCD display with 144Hz refresh rate Pad…
Highlights Vivo X Fold 3 Pro features Snapdragon 8 Gen 2 CPU and up to…
Highlights Four Galaxy Watch 7 models, including Wi-Fi and LTE options LTE models to feature…
Highlights Utilizes ARMv9 instruction set and new "Blackhawk" architecture Built with TSMC's advanced 3nm process…
Highlights Razr 50 features a 6.9-inch pOLED 120Hz folding display Razr 50 Ultra powered by…