Jekyll2022-01-25T09:48:49+00:00https://www.lukaslang.eu/feed.xmlDr Lukas F LangPassionate about maths, science, and dataDr Lukas F LangIt’s a wrap: end of year retrospective2021-12-31T00:00:00+00:002021-12-31T00:00:00+00:00https://www.lukaslang.eu/blog/2021/12/31/end-of-the-year-review<p>I am sure most of you have noticed: apps have released their “2021 in review” stories showing your favourite songs, your most-purchased items, or your most-read articles.</p>
<p><img src="/assets/images/tate-modern.jpg" alt="Visitors at Tate Modern, London, UK" />
<em>Visitors at Tate Modern, London, UK.</em></p>
<p>In this post, I will run a personal <strong>retrospective</strong> to reflect upon 2021 and to formulate a personal vision for 2022.</p>
<p>Despite the pandemic, it has been a great and exciting year for me.
So, let’s go!</p>
<h3 id="what-went-well">What went well?</h3>
<p><strong>Parenthood.</strong> In early 2021, my partner and I became parents.
I have been given the opportunity to go on parental leave for a month right after the birth of our baby son.
I am lucky enough to add two more months in 2022.
Taking care of our son and watching him grow has been incredibly rewarding.
Each day holds new surprises, raises new challenges, and sparks new joy.</p>
<p>I have also come to the realisation that, in expectation, my son will (hopefully) get to know me approximately half of my total life time (the global average life expectancy in 2019 was 72.6 years according to the United Nations).</p>
<p>I promise to give my absolute best to make every single one of those roughly 2000 weeks count, especially during my parental leave!</p>
<p><strong>Work.</strong> In April 2021, our digitalisation team at voestalpine High Performance Metals was spun off as a new company: <a href="https://www.voestalpine.com/highperformancemetals/en/digitalsolutions/">DIGITAL SOLUTIONS</a>.
As the name suggests, our mission is to develop scalable digital solutions and services.</p>
<p>This exciting step allowed my team to grow and to hire more specialists in the areas of Data Engineering, DevOps, and Full Stack Development.
Together with existing Data Scientists we now constitute a fully-functional data team.
We have developed a strong focus on data products, with the primary areas being our divisional big data platform, use case implementations in data science & machine learning, and data-intense web applications.</p>
<p>In 2022, we will grow again substantially and open a new branch that focuses on the development of vision and image analysis products with applications in visual inspection.
It is an exciting opportunity that allows me to combine my past research with my passion for machine learning!</p>
<p><strong>Sports.</strong> In light of the pandemic and as a young parent it has been incredibly challenging to pursue my favourite sports, to which I count rock climbing, yoga, and swimming.</p>
<p>Luckily, I have managed to run on a regular basis:
<img src="/assets/images/activities-2021.png" alt="Activities in 2021" /></p>
<p>In 2021, I have completed more than 52 runs covering <strong>560.8 km</strong> in roughly <strong>51 hours</strong>.
On average, this amounts to <strong>one 11 km run per week</strong> at a <strong>pace of 5.44 min/km</strong>.
My secret sauce: making it a regular practice and having reliable, motivated partners to run with.
In addition, I have continued to commute by bike whenever conditions allowed.</p>
<p><strong>Photography.</strong> While I did not manage to edit and publish any of the many pictures I took during the past years, I have continued to take photos on a regular basis.</p>
<p>One of the early advices given to me as a young parent turned out to be one of, if not the, most valuable: take as many pictures of your child as possible, ideally while they are interacting with your loved ones.</p>
<p>Those memories are indeed a source of great joy.
Thank you for that great advice!</p>
<h3 id="what-didnt-go-well">What didn’t go well?</h3>
<p><strong>Blogging</strong>. While I managed to redesign and launch a new website, I wrote only three blog posts (including this) last year, none of which is technical.
I hope to be able to dedicate more time next year.</p>
<h3 id="what-did-i-learn">What did I learn?</h3>
<p>I have also learned new things last year.
In 2021, I have spent a significant amount of time researching the topic of <strong>investment</strong>.
I managed to develop a solid personal investment strategy.
I did so by following the wisdom of many.
Gathering feedback and opinions from various people in the course of this turned out to be highly valuable.</p>
<p>Another topic I have dedicated quite some time to is <strong>product management</strong>.
In particular, the management of data and machine learning products.
Going through a product discovery phase has been great fun and quite insightful to me.
Pursuing this process together with potential users and customers was key.</p>
<p>Most importantly, I have developed a clearer picture of my <strong>leadership</strong> style and have gotten a better idea of what type of leader I strive to be: to create an environment where teams can innovate and perform, and where individuals can excel.</p>
<p>In the past year, I spent quite a lot of time developing strategies and using data to change course of action.
I like to provide a clear vision and a solid strategy, and then let a team find the best solution.
It requires a high level of trust, embracing failure, and establishing enough safeguards to cope with shortcomings.
While I have a clear vision in mind in most cases, I will have to dedicate much more time to bring it to paper, and to communicate and align it with all relevant people.</p>
<p>Another learning for me is that, within close reach, I have many highly knowledgeable people in each of those areas.
It was worth finding them and talking to them!</p>
<p>For all of the above-mentioned topics I found it best to learn through a combination of reading and practicing.
I will continue to follow the <strong>stop thinking and start doing</strong> approach:</p>
<center><blockquote class="twitter-tweet"><p lang="en" dir="ltr">„You don‘t think yourself into a new way of acting, but you act yourself into a new way of thinking“. Fantastic episode by <a href="https://twitter.com/guyraz?ref_src=twsrc%5Etfw">@guyraz</a> on <a href="https://twitter.com/NPR?ref_src=twsrc%5Etfw">@NPR</a> (via <a href="https://twitter.com/jakobh?ref_src=twsrc%5Etfw">@jakobh</a>). <a href="https://t.co/V610AOT7Zg">https://t.co/V610AOT7Zg</a></p>— Lukas F Lang (@lukaslang) <a href="https://twitter.com/lukaslang/status/1467930899366952966?ref_src=twsrc%5Etfw">December 6, 2021</a></blockquote> <script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></center>
<h3 id="what-did-i-wish-it-happened">What did I wish it happened?</h3>
<p>While I am very grateful for the plenty personal interactions I had in 2021, I wish I had taken more chances to spend time with family and close friends.</p>
<p>On the professional side of things, we have had the first in-person workshop in October since the start of the pandemic.
Despite first signs of a fourth wave, it could take place in Kapfenberg, Austria, under very strict Covid measures in order to guarantee the safety and health of all participants.
It has been my main source for inspiration and motivation for the following quarter at the very least.</p>
<p>For 2022, my wish is to participate in more of such stimulating events!
<img src="/assets/images/kapfenberg.jpg" alt="Kapfenberg, Austria" /><br />
<em>Autumn colours in Kapfenberg, Austria.</em></p>
<h3 id="my-professional-vision-for-2022">My professional vision for 2022</h3>
<p>2021 was all about <strong>laying the foundations</strong> of DIGITAL SOLUTIONS.
I have built a strong team of five, set up new processes, and introduced new tools.
My team has started to develop great products.
Groundwork often means doing things for the first time and from scratch.
It also means to adapt often as circumstances change.</p>
<p>My vision for 2022 is: <strong>sustainable growth and impact</strong>.</p>
<p>It will require new strategies, a constant pace, and a clear focus but, foremost, to keep an open mind about new ideas and unique approaches.</p>Dr Lukas F LangI am sure most of you have noticed: apps have released their “2021 in review” stories showing your favourite songs, your most-purchased items, or your most-read articles. Visitors at Tate Modern, London, UK.What is… Data Science?2021-06-20T00:00:00+00:002021-06-20T00:00:00+00:00https://www.lukaslang.eu/blog/2021/06/20/what-is-data-science<p>It depends who is asking (and whom you ask).</p>
<p><img src="/assets/images/southwark-bridge.jpg" alt="Southwark Bridge, London, UK" />
<em>Southwark Bridge, London, UK.</em></p>
<p>Every once in a while I get asked a seemingly trivial (but somewhat loaded) question in different contexts: <strong>What <em>actually</em> is Data Science?</strong></p>
<p>Those situations include, for example, teaching of basic data literacy to subject-matter experts, collaborating with co-workers, and presenting to senior management.</p>
<p>Ever since the term has found its way into our language, many definitions have been claimed.
Yet, no universally agreed notion exists.
In this post, I argue that:</p>
<ol>
<li><strong>No useful, complete, or timeless ‘one-size-fits-all’ definition exists</strong></li>
<li><strong>A definition should be given nevertheless and must include all data-related activities, be it for intellectual or commercial purposes</strong></li>
</ol>
<p>While going through those two points, I will revisit the much-debated 2017 paper by David Donoho, <a href="https://doi.org/10.1080/10618600.2017.1384734">‘50 Years of Data Science’</a>.
In his article, Donoho recapitulates from a statistician’s point of view on more than five decades of data analysis since John Tukey’s 1962 seminal work <a href="https://doi.org/10.1214/aoms/1177704711">‘The Future of Data Analysis’</a>.</p>
<h3 id="no-useful-complete-or-timeless-one-size-fits-all-definition-exists">No useful, complete, or timeless ‘one-size-fits-all’ definition exists</h3>
<p>Whatever is necessary and helps to solve your problem using data <strong>and</strong> follows a scientific approach is data science.
Full stop.</p>
<p>Donoho’s article starts with an attempt to characterise the current state of affairs around data science. I have clustered them in three main categories.</p>
<p><strong>Today’s consensus definition.</strong>
Donoho quotes curricula and definitions of emerging data science initiatives, mainly at universities.
The author summarises that today’s ‘consensus data science’ is essentially ‘<em>a superset of statistics, machine learning, and technology to deal with big data</em>’.</p>
<p>Clearly, when creating a curriculum, universities need to balance certain criteria, such as duration of study, difficulty of the subject, job prospects of future graduates, or available faculty.</p>
<p>Without question, specialisations are required on university level and naturally result in a significant overlap between disciplines and departments (so do many traditional curricula).
Nevertheless, a practical approach seems to be at the very heart of many data science programmes (which is the same for many other applied curricula).</p>
<p>As a result, the quoted definitions can hardly be used to settle the debate.
They are, however, of great help to prospective students!</p>
<p><strong>Definition by motivation.</strong>
Much to the regret of an academic, Donoho argues that modern data science developments, including above-mentioned curricula, are motivated mainly by commercial, rather than intellectual, interest.</p>
<p>While I personally value an intellectual stance, what matters is the impact a field has, be it on an intellectual or on commercial level.</p>
<p>For example, the <a href="https://www.ref.ac.uk/">UK Research Excellence Framework (REF)</a> evaluates impact, which it <a href="https://www.ref.ac.uk/media/1447/ref-2019_01-guidance-on-submissions.pdf">defines</a> as ‘<em>an effect on, change or benefit to the economy, society, culture, public policy or services, health, the environment or quality of life, beyond academia</em>’.</p>
<p>Being able to demonstrate wider impact of (academic) research really is a great achievement.
Being able to show direct commercial applicability creates uplift.</p>
<p><strong>Definition through big data, skills, or jobs.</strong>
Donoho argues that ‘big data’ is no useful criterion to distinguish data science from statistics (and potentially also from other disciplines).</p>
<p>Many disciplines have a long tradition of analysing plethora of data of all sizes, shapes, and varieties, although their approaches vary and range, for example, from sampling and aggregation all the way to large-scale number crunching and (near) real-time data processing.</p>
<p>His argument is spot on.
Many people practice data science on a daily basis using spreadsheets and basic statistics.</p>
<p>However, Donoho states that today’s consensus data science merely encompasses coping skills to deal with (restrictions of) distributed computing and not so much skills to get to the root of the problem, which he claims is inference from data.</p>
<p>The key, I believe, is to understand that big data, skills, and jobs are interlinked in a circular manner, being inflated by one (hype) cycle after the other: Emergence of big data requires technology and skills for their analysis, which creates jobs and a demand for skilled graduates, who then go on and execute what they have been tought: big data.</p>
<p>I refuse any definition of data science that entails the ‘big data, skills, and jobs’ cycle by necessity as, to the best of my belief, many organisations simply have no need and are well-advised not to enter this territory.
The organisational, technical as well as the cost overhead are tremendous, and the complexity is hard to master.</p>
<p>Nevertheless, with data analysis having become a team effort with individuals increasingly creating more end-to-end data workflows, a possible entry point to big data should not be ruled out, but wisely chosen in the individual case.</p>
<h3 id="a-definition-should-exist-nevertheless-and-must-include-all-data-related-activities-be-it-for-intellectual-or-commercial-purposes">A definition should exist nevertheless and must include all data-related activities, be it for intellectual or commercial purposes.</h3>
<p>Donoho’s article proposes the definition of <strong>Greater Data Science</strong>, which consists of six divisions:</p>
<ul>
<li>Data gathering, preparation, and exploration</li>
<li>Data representation and transformation</li>
<li>Computing with data</li>
<li>Data modelling</li>
<li>Data visualisation and presentation</li>
<li>Science about data science</li>
</ul>
<p>It is a great first start.
However, I feel it falls short of several aspects.
While some of the responses to the article have addressed a potential lack of (software) engineering in this definition (see e.g. <a href="https://medium.com/@srowen/what-50-years-of-data-science-leaves-out-2366c9b61d3d">‘What “50 Years of Data Science” Leaves Out’</a> by Sean Owen), I would like to complement this view as someone who has worked in both academia and industry.</p>
<p>First, what every definition should include are inherent <strong>principles and values</strong> such as:</p>
<ul>
<li>Reproducibility of results, models, and software artefacts</li>
<li>Open science and software culture that includes sharing and building on top of others’ work</li>
<li>Guidelines for practising ethical data science and ethics for creating trustworthy AI</li>
<li>User experience aspects and human in the loop principles</li>
</ul>
<p>— just to name a few.</p>
<p>In all fairness, the reproducibility aspect and the open science movement are well discussed in Donoho’s outlook for the next 50 years of data science.</p>
<p>Second, I would like to add a few aspects that I feel Greater Data Science is lacking:</p>
<p><strong>Communication and presentation skills.</strong>
While both topics are mentioned in Donoho’s article in the context of data visualisation, being able to communicate information and conclusions as a basis for decision making is a core competency of any great data scientist.</p>
<p>Conveying just the right level and amount of information to stakeholders or management is key in complex environments.
Even more so when practising in cross-functional teams.</p>
<p><strong>Domain and expert knowledge.</strong>
A solid understanding of the subject is just as important as deep knowledge of statistics, maths, or computer science.
Without knowing the domain in detail, even the brightest person will just arrive at trivial, well-known, or useless conclusions.</p>
<p>I am in favour of specialising in one well-chosen area.
It is relatively difficult, if not impossible, to simultaneously practice natural language processing on a high level, be an expert in image analysis for bio-medical applications, as well as to master machine learning for applications in the manufacturing and processing industry.</p>
<p>While it is of course impossible to include every possible subject in a data science curriculum, aspiring data scientists try master a subject on their own account early on.</p>
<p><strong>Translator skills.</strong>
Guiding and directing internal data teams or external partners in executing data-science projects, and bridging the gap toward businesses require as much business understanding as they require data science knowledge.</p>
<p>Being able to develop a business case, and to derive the necessary data and model requirements is essential.
So is the ability to transform insights into process changes as well as to enable businesses to act upon recommendations.</p>
<p>It is not everyone’s cup of tea to take ownership and practice leadership in a project.
Successful data-science projects, however, manage to bridge this gap.</p>
<p><strong>Project and people management skills.</strong>
Finally, solid leadership skills are crucial for data-science projects.
This is due to the shear complexity of such endeavours.</p>
<p>Being able to frame and to conduct a data-science or machine-learning project well so that the risk of failure is minimised, and value is delivered early and frequently, requires considerable experience.
Recognising when to abort in a controlled manner is key in order not to pursue dead ends, waste resources, or leave behind scorched earth.</p>Dr Lukas F LangIt depends who is asking (and whom you ask). Southwark Bridge, London, UK.Hi there!2021-03-08T00:00:00+00:002021-03-08T00:00:00+00:00https://www.lukaslang.eu/blog/2021/03/08/hi-there<p>I have decided to finally redesign my website and to start a blog.</p>
<p>In this first post I explain why I start writing, what I will write about, and for whom I intend to write. I will try to ease my posts a bit with pictures I took over the last couple of years.</p>
<p><img src="/assets/images/suidhe-viewpoint.jpg" alt="Suidhe Viewpoint, Scotland, UK" />
<em>Image taken at Suidhe Viewpoint, Scotland, UK.</em></p>
<h3 id="why-do-i-write">Why do I write?</h3>
<p><strong>To communicate.</strong>
Writing is a powerful form of communication.
Being able to convey precise ideas or information in just as many words as necessary is a superpower.</p>
<p><strong>To learn.</strong>
Writing is difficult – at least for me.
Even though I have written several theses, articles, and papers, I would like to improve my skills.
Most importantly, I aim to improve the iterative process I typically follow to get from ideas and a first draft to a finished post that is good enough to be published.
I would like to converge faster.</p>
<p>In addition, I’m a very curious person, and I like exploring and learning new things.
Taking the time to distill and summarise key pieces of information in concise form is a good way to understand material and to memorise things long-term.</p>
<p><strong>To think.</strong>
The act of writing helps me to concentrate and forces me to formulate thoughts and ideas with high precision and clarity.</p>
<p><strong>To grow.</strong>
I discovered that writing is a form of investment in my future self.
I have started to take notes during almost every meeting, and to document all my code and projects in great detail.
In addition, I often write summaries of, for example, papers or books I have read.
This not only helps me to persists information in a way that I (and potentially others) can access quickly in the future, but also allows me to identify gaps in my understanding and typically raises a lot of interesting questions to pursue.</p>
<p>Finally, I simply find pleasure in the act of writing.</p>
<h3 id="what-will-i-write-about">What will I write about?</h3>
<p>I intend to cover a broad range of topics, some of which will be technical, some non-technical.
However, most posts will be related in some way to maths, data science, and machine learning.</p>
<p>For a list of potential topics see my <a href="/topics">idea storage</a>.
Feel free to get in touch and suggest topics you would be interested in.</p>
<h3 id="whom-will-i-write-for">Whom will I write for?</h3>
<p>For the reasons outlined above, I mainly write for my current and my future self.
Nevertheless, I hope that one or the other curious visitor, current or future co-worker will enjoy reading my posts, take away some useful ideas, and maybe even share them online.</p>
<h3 id="how-often-will-i-publish">How often will I publish?</h3>
<p>I have not set any personal goals (yet) and I will have to see how much time I will be able to dedicate to writing.</p>
<h3 id="what-about-feedback">What about feedback?</h3>
<p>I generally try to have an open mind about feedback, so feel free to get in touch and let me know your thoughts.
Having said this, I will only consider constructive feedback, and will only engage in well-intended and fact-based discussions.</p>
<h3 id="how-did-i-set-up-this-website">How did I set up this website?</h3>
<p>It is based on Jekyll, the <a href="https://github.com/niklasbuschmann/contrast">Contrast</a> theme, and is hosted on <a href="https://github.com/lukaslang/lukaslang.github.io">GitHub Pages</a>.
I wanted a rather simple setup that matches my daily workflow using <a href="https://git-scm.com/">git</a>, and that I can easily adjust using my limited HTML/CSS knowledge.
As you can see, I heavily modified the original theme to match my preferences.</p>Dr Lukas F LangI have decided to finally redesign my website and to start a blog. In this first post I explain why I start writing, what I will write about, and for whom I intend to write. I will try to ease my posts a bit with pictures I took over the last couple of years. Image taken at Suidhe Viewpoint, Scotland, UK.