A Corpus Linguistic Approach to the Study of Writer Identity in Second Language Writing

Recently, I’ve been looking at a corpus of texts produced by ESL learners. The texts were produced by about 70 English major college students and the themes of the texts were autobiographical in nature. I’m interested in looking at how corpus linguistics can contribute to the study of identity in the corpus.

Firstly, I’d like to present a few references which give a certain direction to the study:

“All our writing is influenced by our life histories. Each word we write represents an encounter, possibly a struggle, between our multiple past experience and the demands of a new context. Writing is not some neutral activity which we just learn like a physical skill, but it implicates every fibre of the writer’s multifaceted being.”            Ivanic 1998: 181

The autobiographical self focuses on connecting identity with a writer’s sense of their roots, of where they are coming from, and the knowledge that the identity they bring with them to writing is, in itself socially constructed and constantly changing as a consequence of their developing life history.

3 ways to understand writer identity:

a. Autobiographical self: gives sense of roots.

b. Discoursal self: the impression (multiple) a writer conveys relating to values, beliefs and power in a  social context.

c. Self as author: writers’ voice relates to position, opinions and beliefs. (Park, G. (2013))

Identity is related to how a person has been socialized in a community – which is layered with certain values, beliefs, dispositions and power relations.

Writers’ identities are constructed by negotiating the past and present practices, this hopefully will demonstrate cultural values of the writers.

Writing is a situated, social and political practice offering writers of English an opportunity to find power and legitimacy in a second language.

Not focusing on the ‘linguistic code’ – but seeing how learners negotiate their identities may achieve insights into teaching writing settings. (Fujieda 2013)

Students whose written language does not fit the standard are typically linked to a group called ‘basic writers’ and labelled as deficient, incompetent, or even lacking in cognitive ability – therefore students can feel marginalized.

Rather than focusing on negativity –  learners may hear more clearly the voices of their histories and negotiating the ideological boundaries that have both enclosed and excluded them can be critical.

Writers are too frequently labelled as inferior – yet writer identity can be seen as social, political and related to issues of race, class and gender. Writing can be seen as social and cultural processes rather than cognitive or literary. (Fernsten 2008)


Students wrote three autobiographical texts based on a selection of questions. These were collected and stored in a computer-readable format. The data were analysed using either Wordsmith (Version 6) or Sketch Engine.

The approach I used is corpus-driven: A corpus-driven analysis is an inductive process. The corpus is the data and the patterns in it are noted as a way of expressing regularities in language. (Tognini-Bonelli, 2001)

Using the corpora in discourse analysis: a focus on frequency, collocation, keyword analysis and concordances (KWIC). (Baker, 2006)

A look at a frequency list of non-function words is a good started point as it allows a certain understanding of the context of the corpus.


A keyword list will facilitate an understanding of what is salient in the corpus.