Symbol Frequency as a Component of the Statistical Profile of M.Yatskiv’s Short Stories Idiolect

Authors

DOI:

https://doi.org/10.28925/2412-2491.2025.2512

Keywords:

text corpus, text array, writer's idiolect, sequential samples, random samples, entropy, melodiousness of the text

Abstract

This article presents a quantitative analysis of literary text at the graphological and phonetic levels. The study is based on the experimental research corpus of short stories by M. Yatskiv. A text array has been created for statistical research at the symbol (grapheme) level. The absolute (number) and relative frequency of each symbol of the extended Ukrainian alphabet has been calculated in the entire text array. Based on these frequencies, the rank of each symbol has been determined, and entropy has been calculated using the standard formula for the corpus as a whole, as well as for sequentially and randomly selected text segments of 108 characters each. For the entire text array and separately for sequentially and randomly selected segments, the distribution of characters by type and the euphony of the text have been calculated. Euphony has been defined as the proportion of vowels, sonorants, and voiced consonants in the text. The degree of correspondence between the frequencies of characters in the entire corpus and in the segments has been assessed using Pearson’s chi-squared test. The frequency distribution of characters in the research text array has been taken as the hypothetical theoretical distribution function, and the chi-squared statistics have been calculated for each segment. The null hypothesis stated that the frequency distribution of characters in a given segment does not differ from the corresponding distribution in the full text.” Simultaneously, the rank of each character in the frequency distribution has been determined for every segment. All calculations have been made using programs in Python. The results have been compared with similar analysis of short stories by Vasyl Stefanyk. The findings demonstrate that even a randomly selected segment as small as one hundredth of the corpus can approximate the overall frequency distribution of characters with high probability. Moreover, the results indicate that for novellas as a genre with stable structural elements within a specific period of a national literature (e.g., Yatskiv and Stefanyk), linguistic statistics show minimal variation.

Downloads

Download data is not yet available.

References

Karasov, V., & Levchenko, O. (2022). Statistical characteristics of O. Zabuzhko’s idiolect. 2022 IEEE 17th International Conference on Computer Sciences and Information Technologies (CSIT), 138–141. IEEE. https://doi.org/10.1109/CSIT56902.2022.10000546

Kryvuliak, O. V. (2007). Transformatsiia estetyky symvolizmu v novelakh M. Yatskova, O. Pliushcha, I. Lypy [Transformation of the aesthetics of symbolism in the novellas of M. Yatskov, O. Pliushch, and I. Lypy] (Author’s abstract of PhD dissertation). Kyiv: Taras Shevchenko National University of Kyiv. (in Ukrainian)

Kulchytskyi, I. (2019a). Okremi aspekty kvantytatyvnykh doslidzhen ukrayinskoyi movy [Certain aspects of quantitative studies of the Ukrainian language]. Ukraina Moderna, 27, 73–96. https://doi.org/10.3138/ukrainamoderna.27.073

Kulchytskyi, I. (2019b). Statistical analysis of the short stories by Roman Ivanychuk. In Proceedings of the 3rd International Conference on Computational Linguistics and Intelligent Systems (COLINS-2019) (Vol. I, pp. 312–321).

Lototska, N., & Saban, O. (2023). R. Ivanychuk’s idiolect: Quantitative parameterization of the language used in the text. 2023 IEEE 18th International Conference on Computer Science and Information Technologies (CSIT), 1–4. IEEE. https://doi.org/10.1109/CSIT61576.2023.10324093

Loukas, O., & Chung, H. R. (2022). Entropy-based characterization of modeling constraints. arXiv. https://doi.org/10.48550/arXiv.2206.14105

Melnyk, O. O. (2011). Modernistskyi fenomen Mykhaila Yatskova: Kanon ta interpretatsiia [The modernist phenomenon of Mykhailo Yatskov: Canon and interpretation]. Kyiv: Naukova dumka. (in Ukrainian)

Naumenko, A. M. (2003). Blukanina suchasnoho perekladu: vid hlukhoho kuta semiotyky do hlukhoho kuta kohnityvnoi linhvistyky [The wandering of modern translation: From the dead end of semiotics to the dead end of cognitive linguistics]. Nova filolohiia, (3)18, 203. (in Ukrainian)

National Institute of Standards and Technology. (n.d.). Critical values of the chi-square distribution. Retrieved September 28, 2025, from https://www.itl.nist.gov/div898/handbook/eda/section3/eda3674.htm

Oakes, M., & Farrow, M. (2007). Use of the chi-squared test to examine vocabulary differences in English-language corpora representing seven different countries. Literary and Linguistic Computing, 22(1), 85–99. https://doi.org/10.1093/llc/fql044

Pavlyshenko, O. (2004). Markery avtorsʹkoho idiolekta v leksyko-semantychnykh poliakh diiesliv anhlomovnoi khudozhnoi prozy [Markers of the author’s idiolect in the lexico-semantic fields of verbs in English-language fiction]. Mova i kulʹtura, 7(4, Pt. 2), 314–315. (in Ukrainian)

Selivanova, O. O. (2008). Suchasna linhvystyka: napriamy ta problemy [Modern linguistics: Directions and problems]. Poltava: Dovkillia. (in Ukrainian)

Seminck, O., Gambette, P., Legallois, D., & Poibeau, T. (2022). The evolution of the idiolect over the lifetime: A quantitative and qualitative study of French 19th century literature. Journal of Cultural Analytics, 7(3), 1–24. https://doi.org/10.22148/001c.37588

Stelmakh, B. (2004). Indyvidualʹnyi stylʹ yak obʺiekt linhvostylistychnykh doslidzhenʹ [Individual style as an object of linguistic-stylistic research]. Visnyk Umansʹkoho Peduniversytetu. Seriia: Filolohiia (Movoznavstvo), 228–233. (in Ukrainian)

Tkachuk, O. (2013). Naratyvni pryntsypy prozy Mykhaila Yatskova [Narrative principles of Mykhailo Yatskov’s prose] (Monograph). Ternopil: Medobory. (in Ukrainian)

Yatskiv, M. (2016). Chorni kryla. Lviv: Piramida. (in Ukrainian)

Downloads


Abstract views: 14

Published

2025-12-01

How to Cite

Tsiokh, L., & Shyika, Y. (2025). Symbol Frequency as a Component of the Statistical Profile of M.Yatskiv’s Short Stories Idiolect. Studia Philologica, (2 (25), 170–183. https://doi.org/10.28925/2412-2491.2025.2512