I have also used some AI models to help write code and do a bit of dev work and my general experience is that they are pretty good at syntax, identifying small bugs and concise problems - but if you ask it to do anything that's part of some greater schema or something that exceeds maybe 80 lines of code then it is incredibly frustrating. I've been building an app recently and have used GPT-o1, Claude and Google Gemini, and they all tend to suffer from this kind of tunnel vision. I've lost count of the times it has completely forgotten what I told it a few prompts earlier about always using certain naming conventions or structuring things in a certain way. It will recommend an approach but then completely omit a really important limitation that makes the approach totally redundant.
I guess a lot of it depends on how much source code there is out there to have been trained on. I am currently writing a .NET app in MAUI. asking AI ( quite a few different types, github copiot, jetbrains AI, ChatGPT, Gemini etc ) how you do "x" in Maui usually results made up answers from other .NET systems that are not close on working for me.
Me: I am writing a Maui app, I want to do X, how do you suggest doing that.
AI: Do this <Gives code>
Me: your response contains this line that isn't part of Maui
AI: You are correct, My mistake, that isn't in Maui. do this
Me: That now crashes
AI: The crash is because of this line, do this <Gives Code>
Me: that was the same line you just gave me that isn't part of Maui.
.... Rinse and repeat....
the losing track of what was said before will get fixed eventually, for now I think they would need to pass in the entire chat each time to maintain its train of thought but that extra traffic costs more.
some of these newer models are really expensive too. I think its o3 is $20 per request. some of the others are up to $2000 per request and from what I have read give really long verbose ( and very accurate ) responses but you then need to run then through a cheaper model to summerise the response into a useful LLM response.