InternetNewsTechTech NewsTechnology

Tumblr and WordPress will start selling AI training data to OpenAI and Midjourney

According to online publication 404 Media , Automattic, which owns the Tumblr and WordPress platforms, is in the final stages of negotiations to sell user data to artificial intelligence companies OpenAI and Midjourney.



It is not yet known exactly what data will be transferred. Apparently, Automattic originally planned to provide much more information than is allowed under the user agreements. Thus, in an internal post by Tumblr product manager Kyle Gage, it is said that initially the deal wanted to include private or partner data that could not be transferred. This includes closed posts on public blogs, deleted or blocked blogs, unpublished questions and answers in them, private answers, posts marked 18+, and content from partner blogs (for example, the former Apple music site).


An internal post states that Automattic engineers are preparing a list of post IDs that should be excluded from the deal. It is unclear whether this data has already been shared with OpenAI and Midjourney.



Automattic provided an official statement that states:


  • We will only share public content from and Tumblr from sites that have not opted out of such sharing.

The statement notes that current legislation does not oblige AI companies to take into account users’ preferences not to transfer their data.


The last line of Automattic’s statement echoes the terms of the deals mentioned:


  • We also work directly with several AI companies whose plans align with our community’s interests in terms of content attribution, opt-out rights, and data control.

The company is reportedly planning to launch a new opt-out tool on Wednesday that will purportedly allow users to stop their data from being used for AI training, including by companies like OpenAI and Midjourney. According to excerpts from purported internal Automattic documentation on the matter published by 404 Media, using the new tool will block crawlers from accessing the content of those who initially opted out of such data use by adding their sites to the banned list.


One of Automattic’s purported internal documents, addressed to AI chief Andrew Spittle, asks employees about data deletion guarantees when using the new opt-out tool. Spittle explains in his response:


We will regularly inform existing partners of any new opt-out users. I want this to be an ongoing process where we regularly advocate for the exclusion of past content from any future training sets based on current user preferences. We will request that this content be removed and excluded from any further training. I trust that partners will follow these requests based on our preliminary conversations with them. I don’t think it makes much sense for them to keep it.


So, if a Tumblr and WordPress user asks to opt out of having their data used for AI training, Automattic has promised to ask for and advocate for their exclusion. However, the head of the company’s AI division “believes” that AI companies will find it worthwhile.


Deals to sell AI training data have become a way for many online resources to generate additional income during challenging times. Last week, Google agreed with Reddit to provide data from the platform for its AI developments, which should make Reddit more attractive ahead of its upcoming IPO.


In turn, OpenAI last year launched a partnership program to obtain datasets from third-party resources to train its AI models.


Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button