MCFC Analytics

Damocles said:
I'm also looking at doing a blog series of "Beginners Guide to Statistical Analysis in Football". The idea is that as I'm pretty much a novice at this type of statistical analysis with this many data point, I'd blog in detail about what I've tried day by day to hopefully help others to learn/avoid my mistakes.

I like this idea
has anyone with access to the data considered doing predictive/explanatory analyses rather than pure descriptive?

I've seen it mentioned that the data is being released with an eye toward developing player performance metrics (ie things that will be useable by clubs above and beyond pure description)

I'm a research scientist (in NZ) with some football knowledge (above average but not spectacularly knowledgeable) and was considering starting a project looking at developing non-observable underlying performance measures, - in lay-speak - taking a bunch of variables that are all related and seeing if there are any non-observable (latent) factors underpinning performance across the variables/time.

I was considering developing metrics via factor analysis or PCA (principle components analysis) but don't have the breadth of knowledge in football to intuitively know manifest (observable) groupings of skills, abilities, attributes etc (where they would exist as underlying assumptions or be used to theoretically underpin hypothesis)

This is purely speculative at the mo, but as I work at a high profile sports science uni here in NZ, I would be interested in collaborating with (or just info sharing/learning from) knowledgeable football types with an interest in analysis with the aim of developing publications from it (preferably theoretical and context free so as not to violate the terms and conditions under which the data were released)

anyone on here been thinking something similar or have ideas around this sort of analysis and want to go further with it?

(and no i wasn't at York away)
Damo I will help. was thinking we can create an app that will do the whole database setup for them and insert the .XML data into it, also was thinking I might allow you to choose what DB you want(SQLserver, oracle or mysql) as different people will have different preferences.
then it can give then some basic stats about the game, and allow them to more advanced queries for themselves.
I too offer my skills and time if any help is needed (statistician with mySQL, SAS experience. Not so hot on XML soz)
As Data Analytics is my day job, I can't see any project producing any meaningful results till MCFC/Opta can define space on the pitch when players actually tackle, make a pass or shoot (i.e. plot distances and angles to other players at the point of the event). That said it will eventually happen so you can get ready for that day...

The following needs to be achieved with data saved as XML data chunks in a database:

- [1] Store Full (Advanced data set)
- [2a] Decompose Full XML Data Set into XML Input Data Chunks (XQuery 3.0)
- [2b] Save XML Input Data Chunks to database.
- [3a] Build Input XML Data Views from XML Input Data Chunks (XQuery 3.0)
- [3b] Save Input XML Data View results to database.
- [4a] Run Analytic Calculations defined by punter (ETL Tool using XQUery 3.0 sourced data)
- [4b] Save Input Analytic Calculation Results as XML Output chunks.
- [5a] Run Input Analytic Rules defined by punter (Rules Engine using XQUery 3.0 defined events)
- [5b] Save Analytic Rule results as XML Output chunks.
- [6a] Save XML Input Data Chunks to database.
- [7a] Build Input XML Output Views from XML Input & OUtput Data Chunks (XQuery 3.0)
- [7b] Save Output XML View results to database.
- [8a] Run Output Analytic Calculations defined by punter (ETL Tool using XQUery 3.0 sourced data)
- [8b] Save Output Analytic Calculation Results as XML Output chunks.
- [9] XML Chunk visualisation (displaying data to user)
- [10] XML Chunk reporting (reports on data using XML reporting tool)
Note: Model scenarios (end-to end combinations of 1-10 above) using a BPMN modelling tool.

MySQL is probably the best free(ish) database to use for storing XML.
Saxon is the best free(ish) XQuery software about (SQL for XML) and the Talend suite gives you ETL, Rules Engine ad BPMN modelling capability.
Saxon can be found here: <a class="postlink" href="XQuery" onclick=";return false;">XQuery</a>
An XQuery editor is provided by Elclipse with the WTP Tools Project: <a class="postlink" href="" onclick=";return false;"></a>
Talend ETL, Rules Engine & BPMN can be found here: <a class="postlink" href="" onclick=";return false;"></a>

Can somebody please email me the data set?

I am doing my dissertation on the perfect valuation of a footballer, and it will be a few days before I receive the data,

If somebody could email it to me at, that would be much appreciated,



Don't have an account? Register now and see fewer ads!

  AdBlock Detected
Bluemoon relies on advertising to pay our hosting fees. Please support the site by disabling your ad blocking software to help keep the forum sustainable. Thanks.