Skip to content

Locale argument conversion not setting language and country properly #3141

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 tasks
scordio opened this issue Jan 31, 2023 · 21 comments · May be fixed by #4492
Open
2 tasks

Locale argument conversion not setting language and country properly #3141

scordio opened this issue Jan 31, 2023 · 21 comments · May be fixed by #4492

Comments

@scordio
Copy link
Contributor

scordio commented Jan 31, 2023

I am trying to write a parameterized test for a service that uses a resource bundle under the hood, loading the resource bundle with ResourceBundle.getBundle(String, Locale).

The test gets several Locale values in a @CsvSource annotation:

The documentation does not have an example for a country-based locale so I initially tried with:

@CsvSource({
  "en_FR, value1",
  "en_GB, value2",
  "fr_FR, value3",
  "fr_MC, value4",
  "it_IT, value5",
  ...
})
void test(Locale locale, String expected) {
  // call service using the resource bundle

  // compare result with `expected`
}

All worked fine on Windows (local environment) but failed on Linux (CI).

After some digging, on Windows everything works because it's a case-insensitive file system, so for example the en_FR value is converted to a en_fr language-only locale (i.e., without country) and it successfully matches the file ending with en_FR because of case insensitivity.

Looking at the code, I think the problem is here:

which uses the Locale(String) constructor. The constructor Javadoc also explains the different behavior based on case sensitivity:

This constructor normalizes the language value to lowercase.

Looking at the test cases:

assertConverts("en", Locale.class, Locale.ENGLISH);
assertConverts("en_us", Locale.class, new Locale(Locale.US.toString()));

Probably the second one is invalid. I would have expected that to be:

assertConverts("en_us", Locale.class, Locale.US);

which does not pass with the current implementation.

Steps to reproduce

As the documentation does not cover conversion of locales with country, I am not sure what would be the "right" test case to demonstrate the issue.

Assuming the IETF BCP 47 language tag format would be the right format for input values, this test fails already at the first assertion:

  @ParameterizedTest
  @ValueSource(strings = { "en-US" })
  void test(Locale locale) {
    assertEquals("en", locale.getLanguage());
    assertEquals("US", locale.getCountry());
    assertEquals(Locale.US, locale);
  }

Workaround

Currently, I'm using a custom ArgumentConverter which delegates the conversion to Locale::forLanguageTag.

Context

  • Used versions (Jupiter/Vintage/Platform): 5.8.2
  • Build Tool/IDE: Intellij IDEA / Maven

Deliverables

  • DefaultArgumentConverter properly converts IETF BCP 47 strings
  • The documentation has an example using the IETF BCP 47 language tag format
@scordio
Copy link
Contributor Author

scordio commented Jan 31, 2023

I propose to change the implementation at:

replacing Locale::new with Locale::forLanguageTag.

Happy to work on it if accepted.

@marcphilipp
Copy link
Member

@scordio IIUC that could potentially break existing tests that (wrongly or not) rely on the Locale(String) constructor being called, right?

@scordio
Copy link
Contributor Author

scordio commented Jan 31, 2023

@marcphilipp yes, but only if they specify a country and/or variant in the string. They would be broken like the second test case you currently have in the codebase.

However, such tests would have today a "wrong" Locale instance like the one in the reproducer and I'm not sure if they exist in reality. Up to you to judge if fixing this deserves a breaking change.

Tests that are specifying only a "plain" language (i.e., without country, variant, etc.) are safe from my point of view, which might be the most common use case.

@scordio
Copy link
Contributor Author

scordio commented Feb 2, 2023

Looking at Java 19, the Locale(String) constructor has been deprecated together with the other constructors and Locale::forLanguageTag is one of the suggested ways to obtain a Locale object.

@scordio
Copy link
Contributor Author

scordio commented Feb 4, 2023

In case you prefer to keep the current behavior thus avoiding a breaking change, an alternative could be an opt-in argument converter like @JavaTimeConversionPattern.

A few ideas about naming:

  • @LanguageTag
  • @LanguageTagConversion
  • @LanguageTagLocale

The implicit conversion could then be changed in the next major version of JUnit.

@Bukama
Copy link
Contributor

Bukama commented Apr 16, 2023

Looking at Java 19, the Locale(String) constructor has been deprecated together with the other constructors and Locale::forLanguageTag is one of the suggested ways to obtain a Locale object.

We (JUnit Pioneer) also changed our implementation as the old constructor is deprecated, see JUnit Pioneer issue and JUnit Pioneer PR. Why do I note this here? Because some of the builder methods validates the input against a valid pattern - which the old constructor did not. So you might have the same discussion as we if you see this as a breaking change or not.

@scordio
Copy link
Contributor Author

scordio commented Apr 18, 2023

Thanks for the pointers, @Bukama!

As far as I can see, JUnit Pioneer favored the builder only when the fine-grained parameters are specified, otherwise Locale::forLanguageTag is used.

Do you see any issue with using Locale::forLanguageTag to solve the issue with the JUnit argument conversion?

@Bukama
Copy link
Contributor

Bukama commented Apr 21, 2023

Do you see any issue with using Locale::forLanguageTag to solve the issue with the JUnit argument conversion?

I would not expect any issues as this method does not do a syntax check (as far as I understand the docs)

@scordio
Copy link
Contributor Author

scordio commented Apr 22, 2023

In case you prefer to keep the current behavior thus avoiding a breaking change, an alternative could be an opt-in argument converter like @JavaTimeConversionPattern.

A few ideas about naming:

* `@LanguageTag`

* `@LanguageTagConversion`

* `@LanguageTagLocale`

The implicit conversion could then be changed in the next major version of JUnit.

As mentioned in #2702 (comment), a custom argument converter could be a candidate for JUnit Pioneer.

@marcphilipp
Copy link
Member

Team decision: Since the Locale constructor is not deprecated for removal, there's no urgency to change the existing behavior. Since switching from Locale::new to Locale::forLanguageTag would likely break existing tests, we would need to make it configurable one way or another. For now, we'll wait for additional interest from the community.

@sbrannen
Copy link
Member

During the team discussion (resulting in the above team decision), I mentioned that Spring Framework dealt with this issue by introducing "lenient" Locale parsing in order to support both legacy locale formats as well as BCP 47 formats.

@scordio
Copy link
Contributor Author

scordio commented May 12, 2023

Hi @marcphilipp and @sbrannen, I was going to submit a feature request to JUnit Pioneer for a custom argument converter, but stopped because I'm guessing that having the behavior configurable in JUnit would probably make the Pioneer converter obsolete.

I understand you're waiting for additional interest. I am definitely interested in having a solution, one way or another, but I also understand a single person's need doesn't count as a community interest 😅

I'm happy to help with the changes, but some initial design would of course be required.

In case you're open to discussing the solution further, here is my proposal: as I can hardly imagine test suites where you want to have a combination of the old and new conversion behavior for Locale, I would introduce a configuration parameter in the style of junit.jupiter.tempdir.scope, perhaps junit.jupiter.params.arguments.conversion.locale=default|lenient.

In case you don't want to invest more in this right now, that's totally fine and understood!

@marcphilipp
Copy link
Member

@scordio Wouldn't it rather be "ISO 639" vs. "BCP 47"?

@scordio
Copy link
Contributor Author

scordio commented May 13, 2023

@marcphilipp I guess you're referring to the values of the configuration property, right?

If the change would be the one I mentioned at #3141 (comment), yes, "ISO 639" vs. "BCP 47" would make more sense as property values.

In the previous comment, I proposed default vs. lenient following the example @sbrannen described at #3141 (comment) but probably it's not needed to have a mode supporting both at the same time (I actually mentioned the same in my previous comment... fighting with myself, it seems 🤦).

Just as a last comment, I was reading again the Javadoc of Spring's StringUtils.parseLocale(). Overall, Spring supports three variants:

  1. Locale's toString() format (e.g., "en", "en_US")
  2. Locale's toString() format with spaces as separators (e.g., "en US")
  3. BCP 47 (e.g. "en-US")

I tested how the two solutions would perform with such use cases:

Locale::getLanguage Locale::getCountry Locale::toString
new Locale("en") en en
Locale.forLanguageTag("en") en en
new Locale("en_US") en_us en_us
Locale.forLanguageTag("en_US")
new Locale("en-US") en-us en-us
Locale.forLanguageTag("en-US") en US en_US
new Locale("en US") en us en us
Locale.forLanguageTag("en US")

Neither of the underscore and space use cases are adequately supported by either new Locale(String) or Locale.forLanguageTag(String). However, I don't know if JUnit should support these use cases at all.

My proposal would be to not support them as they are not backed by any standard.

Thoughts?

@scordio
Copy link
Contributor Author

scordio commented Jun 1, 2023

@marcphilipp if you don't have objections, I'll compose a PR in the next few days for this topic.

@marcphilipp
Copy link
Member

SGTM. Sorry for the delayed response. I was out for a few days.

@marcphilipp
Copy link
Member

@scordio Do you have time to pick this up this week? If not, I can take a stab at it since we should include it in 5.13.

@scordio
Copy link
Contributor Author

scordio commented Apr 14, 2025

Right, I've been falling behind with it, sorry! I'll let you know if I cannot make good progress till Wednesday 🙂

@scordio
Copy link
Contributor Author

scordio commented Apr 16, 2025

I'm not yet ready to raise a PR, but here's a small preview of what I'm doing:

  • New Jupiter configuration property: junit.jupiter.params.arguments.conversion.locale.format with value either iso_639 (default) or bcp_47 (default)
  • Such property is consumed by the Jupiter DefaultArgumentConverter, which would have Locale specific logic before delegating to Platform Commons ConversionSupport
  • ConversionSupport will have no way to fine-tune the implicit conversion of Locale values, which currently happens in StringToCommonJavaTypesConverter

The Locale special logic in DefaultArgumentConverter could then be dropped in JUnit 6, and StringToCommonJavaTypesConverter could be updated with the breaking change previously proposed.

Would this make sense, or do you see a better strategy?

The downside of my proposal is that direct users of ConversionSupport wouldn't have a gentle migration path.

@marcphilipp
Copy link
Member

  • New Jupiter configuration property: junit.jupiter.params.arguments.conversion.locale.format with value either iso_639 (default) or bcp_47

Shouldn't bcp_47 be the default so we can drop the config param with 6.0?

Would this make sense, or do you see a better strategy?

We could make junit.jupiter.params.arguments.conversion.locale.format a system property instead of a configuration parameter and put the logic into ConversionSupport right away. However, from a user's perspective it would be surprising that this can't be configured using junit-platform.properties etc. as usual. Therefore, I think we should stick to the plan you've proposed.

The downside of my proposal is that direct users of ConversionSupport wouldn't have a gentle migration path.

Assuming 6.0 includes #853, we could document in its release notes (or some migration section in the User Guide) how to register a custom ConversionService that uses Locale::new, right?

@scordio
Copy link
Contributor Author

scordio commented Apr 17, 2025

Shouldn't bcp_47 be the default so we can drop the config param with 6.0?

Yes, sorry, typo 😅

Therefore, I think we should stick to the plan you've proposed.

👍

Assuming 6.0 includes #853, we could document in its release notes (or some migration section in the User Guide) how to register a custom ConversionService that uses Locale::new, right?

Yes, that would be a more reliable way forward for users who want the previous behavior back.

One more point: I think I must introduce a constructor parameter to DefaultArgumentConverter for accessing the new config parameter (likely, the ExtensionContext), which means it will pollute a bit the parts of the code that today rely on DefaultArgumentConverter.INSTANCE.

@scordio scordio linked a pull request Apr 19, 2025 that will close this issue
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment