@tehwatever
I think a lot of people, including some of those ‘reporting’ on the field are making the mistake of conflating AI with the Midjourney model. Ie: a closed off proprietary model you pay for access to and you just enter text for an output. This is only a small slice of what AI art does because all of the things you gave as examples are possible today, not only are they possible I frankly believe that they are the only real future for AI art as anything more than a curio or hobby.
The fact is, if you leave everything up to the AI you can get some diamonds in the rough but only if your goal is… not particularly specific. Like if you want a picture of Twilight sparkle reading a book, it can do that, but the specifics can’t be remembered. If you generate the perfect OC in a barbarian costume, the next generation of that same prompt will always be subtly different because the AI doesn’t know whats important it only knows generalities, theoretically a sufficiently detailed prompt might get around this but I am frankly skeptical of the average persons ability to describe in precise enough detail in meaningful language exactly what they want and in a way the AI can’t easily misinterpret.
I think what actually happens when the average person generates a piece, they have a general idea of what they want and since the AI excels at filling in general nonspecific detail then they say ‘oh wow, good enough’. This is great for the hobbyist who wants a picture of a generally well understood character doing some task ‘Twiggles reading’. This is not great for someone wanting a poorly understood character [think: an oc] with or doing something specific. So for example, if you have a knight OC you probably have a related ‘canon’ suit of armor for that character, you have it set up in such a way that makes the character more specific than ‘a blue batpony in armor’ but unless you can painstakingly describe in full detail every greave, leather strap, pauldron and piece of chainmail [and god help you, the ai won’t know where half of them go] then you won’t get your OC every time by prompting ‘a blue batpony in armor’ you’ll get a bunch of subtly similar but unrelated characters.
As someone who has been using AI for a few months now, I don’t think this is a surmountable problem realistically for the txt prompters. Even if their work is visually stunning the lack of specificity will damn them when it becomes important.
The only way you can consistently reproduce a character is if you have something like a dreamtheater model, and the only way to get a halfway decent dreamtheater model is to pay an artist(s) to draw your character from 20 angles 20 times. This is not a practical replacement for OC creation, it just isn’t. What this is practical for, is for an artist who, for the sake of a comic or whathaveyou, already has or will draw many different versions of the same specific character. I have tested this with the consent of an artist friend and was able to reproduce detailed outfits and colourschemes only after about 13 images. I have to imagine every commission artist would be weeping with joy if the average commissioner had to buy in 13 times before accessing the AI market, because that’s a hell of a lot more than the current median.
Moving on you talk about replacing the creative ‘fun’ side of art, I think to some extent this is true for things like novelai or midjourney but on home models I just don’t think that is the case. some of the best and most impressive pieces of AI art I have seen are heavily modified by the creator who saw an original image that only had potential but needed a guiding human hand to realize it. Some cases of inpainting can be shocking in how different the original AI image is from the final human guided product.
Here is a older example of inpainting, it’s far from the most impressive image but over 1400 layers each guided by a human made brushstroke it can demonstrate how images change with human guidance. Now here I’m open to saying that your mileage may vary, everyone’s creative process is different and I would not begrudge someone who had absolutely no interest in sifting through 3000 directionless AI generated images to find one that can have meaning added to it over the spam of 3-5 hours of relentless inpainting.
Now as for more specific examples of grunt work being taken over by the AI, in this I cannot give very many good examples but it is still very much a real thing. If I had more mechanical artistic skill I could demonstrate better and I have some AI pieces I could open up psd’s for and show you parts that were originally hand drawn but here is a very quick example of
what is possible. I can draw the left piece in 20 minutes, I can draw the right in 3 years. I can generate the right by drawing the left in 20 minutes and running it through the AI for another 10. Is this not colouring and inking for someone who otherwise lacks the skill? Obviously it’s not perfect, [the current iteration of the pony model struggles with bandanas and slit pupils for now] but I had the creative idea of a bat pony mare singing into a microphone and via AI I was able to execute that creative vision on a level I’d be otherwise incapable, you can’t say this particular image lacks agency because I was there driving it from minute one, in the guassian noise my sketch was there seeding the way.
Now as for lacking an understanding of what exactly is in these models, again I would call this a midjourney problem. We don’t know what was involved there, they keep it all close to the chest. for public models though, learn some python and you can open them up. See the tokens for yourself, fiddle with the weights. we also know they’re all iterated off of the default stable diffusion model [usually 1.4] and we all have direct searchable access to those datasets.
In the end we are very early on in this change, it’s hard to say exactly what the future will bring. I for one, having used these generators for months now, think the ‘death of the artist’ is massively overstated. I have hope for a future where artists are less constrained on time, where someone working a 9 to 5 can express themselves without bleeding on the canvas, where cheap consumable art that enriches experiences isn’t completely inaccessible to 99% of the planet. Where access to these tools is not the exclusive domain of megacorps who own enough media to train AI while everyone else doesn’t have a prayer.
I don’t even want people to change their ways. If you don’t like AI art then keep it filtered, if the suffering of the process is the spice of life for you, then okay throw out the last 100 years of art theory and go back to counting paintstrokes if you want, that is your right. I don’t think you should be forced to look at it if you don’t want to, as long as the hostility dies down I’m cool with whatever.
now as for the whole ‘stealing’ thing, it just does not make sense once you understand how the system works. The act of training is noising on images with known subjects and grading the machines ability to correct the noise, this process allows it to associate concepts to shapes/colors/etc. Do you remember that famous painting of Jesus that was partially destroyed and a woman attempted to ‘restore’ it despite being very unskilled as a painter? That’s basically what the AI is doing, it sees damage and tries to fix it without the context of the original whole piece. Why do we do it like this? Because the act of generating is lying to the AI that you have a picture of Jesus that needs repairing and then presenting it with a blank canvas [random gaussian noise actually]. The AI then desperately makes random changes to the noise until any part of it vaguely resembles the target concepts and then iterates further from there,
you can actually generate images along every step of the process to see how the AI gradually transforms nothing into fuzzy shapes into detail, It doesn’t ‘know’ about the detail in advance and will frequently and obviously ‘change its mind’ as the process continues. It is just trusting the user who says that there is supposed to be picture of Jesus so it keeps iterating and iterating until it makes the thing it’s trying to find. There are people who imagine that this tech is some great kitbasher that makes composite monstrosities but this is not an illusion that will last long if you actually use the program. the images are an ever narrowing cloud of possibilities within the latent space, this is why it can’t really do specifics because the possible interpretations of a concept like ‘Jesus’ are so widely varied that reducing chaos to a picture of Jesus will never result in the same picture twice.
TLDR: Too many words, Cheerilee is best pony.